[erlang-questions] Profiling tool to find bottleneck on my Erlang code

Fri Oct 7 10:53:45 CEST 2011

On Fri, Oct 7, 2011 at 10:08 AM, Peer Stritzinger <peerst@REDACTED> wrote:
> On Thu, Oct 6, 2011 at 10:18 PM, Knut Nesheim <knutin@REDACTED> wrote:
> How do you trace a request across processes?  Do you need some id
> tagging of the messages etc?

Yes, this is done using "sequential tracing" offered by the VM. When
the entry point is executed, a trace token is explicitly set. When
this process messages another process, the message contains the trace
token and by receiving the message, the trace token is also set in the
receiving process. This is handled transparently by the VM, so there
is no need to instrument your own code other than calling a kprof
function to set the trace token in the entry point.

>
> Normal gen_server call like requests would probably work, but what
> about cast like messages.

These are also tagged by the VM and the trace token is also set in the
receiving process. However, as I'm interested in measuring runtime of
the actual request, the runtime of the process that received the cast
should not be included. The profile measures runtime in certain
functions, not processes (this could be changed).

>
> If I want to use kprof for protocol layers, where messages travel up
> and down the layers freely and a request after travelling down the
> layers is e.g. answered by a response from outside the system, which
> makes the top layer function call finally return.  Is it possible to
> trace requests like these without instrumenting all layers and/or the
> messages?

Yes, this is exactly my use case. As mentioned above, only the entry
point for the request needs to be instrumented. At the moment, it is
not possible to set a random trace token from within the matchspecs
(at least I could not figure out how to do it). If I could do that, no
instrumentation would be needed.

The profiler is built around the idea to profile each layer, by
specifying one or more functions that is called when the request
enters this layer. These functions must return, otherwise you cannot
know their runtime (gen_server for example is not implemented in this
way, so we can only profile the callbacks)

Here is an example of how you would configure kprof given a very
simple MVC architecture:

* http_handler:handle_request/1: entry point, sets the trace token,
when this returns the response to the webserver, the request is
complete
* business_logic:do_stuff/1
* db_access_layer:execute/1

If every request goes through these tiers, you would get a nice
overview of where you are spending time broken down per "tier". This
is already working.

The "tiers" is nothing more fancy than just a specific function to
profile, so you can do more fine grained stuff like this:

* http_handler:handle_request/1: entry point, sets trace token
* http_handler:request_validation/1
* http_interface:handle/1: business logic may start here
* state_server:handle_call/3: some process involved in the request,
keeping state, but not really a part of any tier
* external_system_lookup:execute/1
* stat_lib:regression/1: slow code doing lots of fancy calculation
* db_access_layer:get_key/1
* db_access_layer:put_key/1

With the above setup you would get the runtime stats for each request
per function you specify. For a function that is executed multiple
times within a request, you would get the total runtime (this might
require some work still)

As you can see, there is nothing really fancy in kprof. My goal is
just to put the pieces together, like sequential tracing, aggregation,
sampling, statistical analysis and being able to continuously run
inside your live system.

Regards
Knut