recording process crash (in supervisor?)

Rick Pettit rpettit@REDACTED
Fri Sep 30 19:19:08 CEST 2005

On Fri, Sep 30, 2005 at 08:05:05AM -0400, Serge Aleynikov wrote:
> Rick Pettit wrote:
> [...]
> >Having to dig up the (undocumented?) format of the crash_report message 
> >(and
> >count on it not changing across releases) troubles me (duh). Surely what I
> >did must be "wrong", no?
> You can look at proc_lib:format(Report) for formatting crash reports, 
> and extracting appropriate details.

This is the documentation I was looking for--thank you.

> >Also, it now seems clear that I need _two_ processes in addition to the 
> >supervisor to do a job that a simple supervisor callback could do just as 
> >well--one child/worker process (to invoke gen_event:add_sup_handler/3) and
> >the actual gen_event handler process to receive and process error_logger
> >messages.
> Not quite.  In reality you either don't need any additional processes, 
> or need one - a supervised guard of the event handler.  The primary 
> difference between gen_event and gen_server is that gen_server runs in 
> the context of a dedicated process of its own, whereas gen_event runs in 
> the context of an EventManager to which the event handler is being 
> added.

Duh, of course. Silly me.

> If (and only if) fault tolerance is needed, a separate process 
> can be used to trap event handler's crash messages.  This is 
> accomplished by using gen_event:add_sup_handler/3, which will instruct 
> the EventManager to send a message to that process indicating that the 
> event handler was removed, but other than that this process will do 
> nothing.  If you needed fault tolerance of the event handler, you could 
> add this worker process to a supervisor, where this process would simply 
> implement a loop:
> init() ->
>     gen_event:add_sup_handler(error_logger, ?MODULE, []),
>     loop().
> loop() ->
>     receive
>     {gen_event_EXIT, ?MODULE, Reason} ->
>         exit(Reason);
>     Other ->
>         loop(Handler)
>     end.

Perfect--I think I finally see the light.

> >Am I doing something unconventional here (i.e. processing/recording process
> >crash info)? It seems like there should be an easier way. It also seems as
> >though my error_logger handler, which only really cares about crash_report
> >information, is going to have to "ignore" a whole lot of other messages 
> >which
> >a supervisor (callback/handler) wouldn't even see--this seems needlessly
> >inefficient.
> If you examine the SASL's and KERNEL's error reporting, this is how its 
>  done there (ignore irrelevant messages).  I am not in position to 
> question the efficiency of this approach, as this hasn't been an issue 
> in the applications I've been building.

Nor for me (IIRC the rule of thumb is to 1) make it work, 2) make it beautiful,
3) make it fast). I need to get past (1) first :-)

> One thought though is that an OTP process crash is an infrequent event 
> (compared to all normal processing).  Therefore the question about 
> efficiency of processing crash info might be irrelevant to the 
> efficiency of the system as a whole, given its rare likelihood.


You have been very helpful, thanks again.


More information about the erlang-questions mailing list