Debugging in an Asynchronous World

Tue Feb 10 11:11:27 CET 2004

On Sun, 08 Feb 2004 23:44:41 +0100, Marc van Woerkom 
<marc.vanwoerkom@REDACTED> wrote:

> On Sun, 08 Feb 2004 18:33:05 +0100, Ulf Wiger <ulf.wiger@REDACTED> 
> wrote:
>
>> Well, one way to look at it is that they're validating
>> many of the design choices in Erlang.  (:
>
> I would lvoe to hear an Erlang point of view comment on this article:
>
>    Debugging in an Asynchronous World
>    Michael Donat, Silicon Chalk
>
>    Hard-to-track bugs can emerge when you can't guarantee sequential  
> execution.
>    The right tools and the right techniques can help.
>
>    http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=63
>
> Regards,
> Marc

     marc,

   I found Donat's piece a pretty accurate description of what I've found 
working with the AXD301 (an Ericsson telephony switch, typically ~6 
cpus/erlang nodes, ~2000 concurrent erlang proceses, ~20 processes 
involved in a phone call, ~ half of them short-lived). Some comments;

  Donat recommends "aggressive use of assertations"; this is what the 
Erlangers call "let it crash". has been discussions about this on the list.

  He says "manual testing is simply too irregular, slow, and expensive"; 
this is of course perfectly true. we've found the OTP test server to be 
pretty indispensible. not sure if it is officially supported.

  Debuggers; we rarely use the OTP debugger (even though it is quite good, 
especially when run through distel, http://www.bluetail.com/~luke/distel), 
since our applications typically time out (making the debugger useless). 
for finding the "simple" bugs (i.e. no message passing, no timers) it 
works fine, for the hard bugs it's useless.

  Tracing is IMHO almost always The Right Thing. Some of Donats thoughts on 
tracing;
    "Keep the trace mechanism as simple as possible so you can minimize the 
number of OS calls you have to make."
    "Collect the trace in memory to make it as fast as possible."
    "Use a separate low-priority thread to write trace memory to disk."
  This pretty much describes the tracing in the Erlang emulator.

   Donat again; "Tracing everything will likely corrupt our results". The 
Erlang tracing has an extremely powerful (although perhaps somewhat 
obscure :>) selectivity mechanism called match specs. tracing can very 
light-weight if the selection does not kick in.

   We use a tool called "pan" developed in-house for taking, filtering and 
analyzing traces, it's in the jungerl (http://jungerl.sourceforge.net)
   short blurb; 
(http://cvs.sourceforge.net/viewcvs.py/*checkout*/jungerl/jungerl/lib/pan/doc/HOWTO.html?content-type=text%2Fplain).

   we use tracing/pan not only to find the hard bugs, but also to 
characterize and optimize the system.

     mats

   (apologies for stupid disclaimer below)

This communication is confidential and intended solely for the addressee(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you believe this message has been sent to you in error, please notify the sender by replying to this transmission and delete the message without disclosing it. Thank you.

E-mail including attachments is susceptible to data corruption, interruption, unauthorized amendment, tampering and viruses, and we only send and receive e-mails on the basis that we are not liable for any such corruption, interception, amendment, tampering or viruses or any consequences thereof.