[erlang-questions] An answer: how does SASL know that a process died?

Thu Oct 31 11:27:33 CET 2013

Hi,

TLDR: Whenever an Erlang process dies an abnormal death, there's some
      code deep in the VM which sends an unstructured message to
      error_logger about the death. This was surprising to me.

The Question
------------

I was going to ask "where does this 'ERROR REPORT' message come from?":

   ~ >erl -boot start_sasl
   Erlang R15B03 (erts-5.9.3.1)...
   ...
   1> spawn(fun() -> 3 = 9 end).
   <0.42.0>
   2>
   =ERROR REPORT==== 31-Oct-2013::10:51:47 ===
   Error in process <0.42.0> with exit value: {{badmatch,9},[{erl_eval,expr,3,[]}]}

But before asking, I dug out the answer myself. So this post is a
question with the answer supplied. Hope someone's interested.

Anyway, this "Error in process <0.42.0>" message, how can that
possibly work?

Impossible answers
------------------

No Erlang process is linked to <0.42.0>---I used plain spawn()---so it
can't work through links.

No Erlang process is monitoring <0.42.0>, so that's out too.

I even checked that there's no tracing on. There isn't.

I can't find anything in the 'Barklund Draft' which says that abnormal
process death should give information to another process through any
other mechanism. So, this is a top secret part of Erlang, available
only to helmeted, blonde, bearded eaters of rotten fish.

The actual answer
-----------------

Deep in "beam_emu.c", there's terminate_proc(). Here's what it does:

   erts_dsprintf_buf_t *dsbufp = erts_create_logger_dsbuf();
   erts_dsprintf(dsbufp, "Error in process %T ", c_p->id);
   erts_dsprintf(dsbufp,"with exit value: %0.*T\n", display_items, Value);
   erts_send_error_to_logger(c_p->group_leader, dsbufp);

So, the exit value term, i.e. {badmatch, 9} and the stack trace is
turned into a string (!) and then sent to the process registered as
'error_logger'.

It seems OTP invaded the Erlang VM a bit... The other times I've seen
the VM send messages to the error logger, it's because something's on
fire, e.g. distribution has gone nuts. Not something mundane like a
process dying. Seems like a quick hack left over from long ago.

The fix
-------

If you implement your own error_logger, it's tempting to match these
messages so you can do things with them---you might want to format
them differently, or someone might have a burning need to translate
them to Maori---but this is unpalatable because the message comes as a
string.

That leaves the approach taken by proc_lib:spawn(), which is to wrap
the spawned code in a 'try', which means the VM never gets its fingers
on that crash. And that then gets you back to what I expected: if I
spawn() a process, I want it to just die quietly, even if it
crashes. Shame that's not the default.

Matt