[erlang-questions] Erlang tracing

Mon Sep 21 12:02:31 CEST 2015

> On 21 Sep 2015, at 10:52, Lukas Larsson <lukas@REDACTED> wrote:
> 
> To start the discussion, here are a few of my ideas in no particular order:
>   * Allow multiple tracers. Today only one port/process can be the receiver of trace data.
>   * Create a couple of scalable high throughput tracing output backends with different overflow mechanics. Today all tracing is funneled through one bottleneck and has no overflow handling at all.
>   * Expose vm probes (today dtrace probes) to the erlang tracer.
>   * Better integration of dtrace/lttngt/systemtap into the erlang trace.
>   * Allow the erlang tracer to be an Erlang callback module. Today only ports/processes are allowed.
>   * Optimize trace output to file/ip. Maybe use something like the Common Trace Format (http://git.efficios.com/?p=ctf.git;a=blob_plain;f=common-trace-format-specification.md;hb=master <http://git.efficios.com/?p=ctf.git;a=blob_plain;f=common-trace-format-specification.md;hb=master>), instead of the term_to_binary that we have today.
>   * Write much much better documentation for dbg :)

+1 on the suggestions, esp. multiple tracers. The one-tracer restriction makes it hard to explore some very interesting monitoring and debugging ideas.

I have had some success using an event/[2,3] function for lightweight debugging, where the functions event(Line, Event) and event(Line Event, State) exist only to be traced. A simple way to generalize this would be to create ‘null’ BIFs: erlang:trace_event(Info, Event) -> ok, erlang:trace_event(Info, Event, State) -> ok (just a trace_event/1 would of course suffice). If not traced, these functions do nothing; when traced, they emit trace info similar to io:format debugging.

See e.g. the locks application:
https://github.com/uwiger/locks/blob/master/src/locks_leader.erl#L140 <https://github.com/uwiger/locks/blob/master/src/locks_leader.erl#L140>
https://github.com/uwiger/locks/blob/master/src/locks_leader.erl#L823 <https://github.com/uwiger/locks/blob/master/src/locks_leader.erl#L823>

The ?debug/[1,2] macro is used to also include line numbers. One could perhaps imagine such a macro existing in dbg, but putting an event() function in dbg as well forces a remote call even when tracing is not enabled.

So basically, the features are:
- a ‘debug’ function that is as lightweight as possible when not traced, just existing for the purpose of being traced
- including line numbers

I’ve found this extremely useful not least when enabled during multi-node test suite execution, using ttb to merge logs and a final pretty-printing step to produce a text log best viewed with emacs's Erlang syntax highlighting. See https://github.com/uwiger/locks/blob/master/src/locks_ttb.erl <https://github.com/uwiger/locks/blob/master/src/locks_ttb.erl> (Logs fetched only if test case fails).

One might fantasize about enabling migration from io:format() debugging to this, ideally by simply rewriting the debug macro: a handy wrapper that simulates the original printouts. There are some problems with this, so I’ve chosen not to go there.

To further generalize the issue …

What we ended up with at Ericsson was a variety of techniques for getting the most out of tracing:

- A script that inserted a ?DBG macro after each ->. This was a conditional debug printout, but could have been a trace event as above. The feature was to log the ‘branching’ in the code, e.g. noting which case function or receive clause was actually selected.
- Consistently breaking out case clauses etc as separate functions (since function calls can be traced). See e.g. the diameter application in OTP for examples of this style of coding.

I think there is room for revisiting the ‘et’ concept in this context. The thing about ‘et’ was that it allowed for a ‘model-level’ trace, complete with sequence diagram presentation. Its main drawback was that few figured out how to use it. But I think the idea is brilliant: to be able to dynamically extract ‘model markers’ from executing code and displaying it in a way that can tie back to the conceptual design.

http://jlouisramblings.blogspot.se/2011/10/using-event-tracer-tool-set-in-erlang.html <http://jlouisramblings.blogspot.se/2011/10/using-event-tracer-tool-set-in-erlang.html>
http://souja.net/2009/04/making-sense-of-erlangs-event-tracer <http://souja.net/2009/04/making-sense-of-erlangs-event-tracer>
http://sysmagazine.com/posts/131140/ <http://sysmagazine.com/posts/131140/>

Lastly, TTB has been greatly improved (ahem…) and is extremely useful for multi-node tracing. When improving the DBG documentation, make sure to show how TTB and DBG work together and what their respective roles are.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150921/d7b62f39/attachment.htm>