recording process crash (in supervisor?)

Fri Sep 30 14:05:05 CEST 2005

Rick Pettit wrote:
[...]
> Having to dig up the (undocumented?) format of the crash_report message (and
> count on it not changing across releases) troubles me (duh). Surely what I
> did must be "wrong", no?

You can look at proc_lib:format(Report) for formatting crash reports, 
and extracting appropriate details.

> Also, it now seems clear that I need _two_ processes in addition to the 
> supervisor to do a job that a simple supervisor callback could do just as 
> well--one child/worker process (to invoke gen_event:add_sup_handler/3) and
> the actual gen_event handler process to receive and process error_logger
> messages.

Not quite.  In reality you either don't need any additional processes, 
or need one - a supervised guard of the event handler.  The primary 
difference between gen_event and gen_server is that gen_server runs in 
the context of a dedicated process of its own, whereas gen_event runs in 
the context of an EventManager to which the event handler is being 
added.  If (and only if) fault tolerance is needed, a separate process 
can be used to trap event handler's crash messages.  This is 
accomplished by using gen_event:add_sup_handler/3, which will instruct 
the EventManager to send a message to that process indicating that the 
event handler was removed, but other than that this process will do 
nothing.  If you needed fault tolerance of the event handler, you could 
add this worker process to a supervisor, where this process would simply 
implement a loop:

init() ->
     gen_event:add_sup_handler(error_logger, ?MODULE, []),
     loop().

loop() ->
     receive
     {gen_event_EXIT, ?MODULE, Reason} ->
         exit(Reason);
     Other ->
         loop(Handler)
     end.

> Am I doing something unconventional here (i.e. processing/recording process
> crash info)? It seems like there should be an easier way. It also seems as
> though my error_logger handler, which only really cares about crash_report
> information, is going to have to "ignore" a whole lot of other messages which
> a supervisor (callback/handler) wouldn't even see--this seems needlessly
> inefficient.

If you examine the SASL's and KERNEL's error reporting, this is how its 
  done there (ignore irrelevant messages).  I am not in position to 
question the efficiency of this approach, as this hasn't been an issue 
in the applications I've been building.

One thought though is that an OTP process crash is an infrequent event 
(compared to all normal processing).  Therefore the question about 
efficiency of processing crash info might be irrelevant to the 
efficiency of the system as a whole, given its rare likelihood.

Regards,

Serge