recording process crash (in supervisor?)

Thu Sep 29 13:59:23 CEST 2005

Rick,

Even though you don't seem to favor the addition of another event 
handler, that is pretty much the only approach of getting custom 
handling of crash reports.

As you correctly pointed out when there is a process crash, a supervisor 
calls error_logger:error_report/2, which indeed is the candidate for a 
custom callback.  Such a handler is very simple to implement (see 
stdlib's error_logger_tty_h.erl).

What you can do is that you can add another child process to the 
supervisor of interest, that can use 
gen_event:add_sup_handler(error_logger, YourHandler, Args).  The 
presence of  the child process (with appropriate {gen_event_EXIT, 
YourHandler, _} message monitoring) will reinstall this handler in case 
of crashes.

What puzzles me about this last approach is that neither error_logger or 
SASL use supervised handlers for event reporting to screen.  This raises 
a rhetorical question: if the implementation code is 100% correct, does 
it mean that the process running this code doesn't require a supervisor? 
Perhaps someone on the list can share his/her perception on this...

Serge

P.S. In a couple of weeks I am planning to make a contribution (LAMA - 
Log and Alarm MAnager) that will demonstrate the use of this principle 
for sending all error reports and alarms to syslog / snmp manager.

Rick Pettit wrote:
> I want to record application process crash info (proc_name/date/time/reason)
> in an ETS table which persists as long as the top-level supervisor remains
> alive. I realize I need to create the ETS table from the supervisor in order
> to ensure it persists past all other application process crashes.
> 
> What I don't know is if/where there is a hook for recording such information
> from the supervisor. I don't see any supervisor callback which would allow
> for recording of process crash info.
> 
> I see supervisor.erl in stdlib appears to log this information to the
> error_logger (when reason is not normal|shutdown):
> 
>   do_restart(permanent, Reason, Child, State) ->
>       report_error(child_terminated, Reason, Child, State#state.name),
>       restart(Child, State);
>   do_restart(_, normal, Child, State) ->
>       NState = state_del_child(Child, State),
>       {ok, NState};
>   do_restart(_, shutdown, Child, State) ->
>       NState = state_del_child(Child, State),
>       {ok, NState};
>   do_restart(transient, Reason, Child, State) ->
>       report_error(child_terminated, Reason, Child, State#state.name),
>       restart(Child, State);
>   do_restart(temporary, Reason, Child, State) ->
>       report_error(child_terminated, Reason, Child, State#state.name),
>       NState = state_del_child(Child, State),
>       {ok, NState}.
>   ...
>   ...
>   ...
> 
>   report_error(Error, Reason, Child, SupName) ->
>       ErrorMsg = [{supervisor, SupName},
>                   {errorContext, Error},
>                   {reason, Reason},
>                   {offender, extract_child(Child)}],
>       error_logger:error_report(supervisor_report, ErrorMsg).
> 
> If I want to process crash information (name/date/time/reason) when application
> processes crash is the convention to install a custom handler via
> error_logger:add_report_handler/[12]?
> 
> My knee jerk reaction is that it would be awfully nice if the supervisor
> behaviour simply provided a callback for processing process crash info. The
> callback could even be spawn'd if risk of crashing the supervisor in the 
> handler was a concern.
> 
> Thanks for wading through the rambling--any comments/suggestions are much
> appreciated.
> 
> -Rick
> 
> P.S. One approach which I have seen work but which seems cumbersome and
>      unnecessary involved adding an addition process, under the top-level
>      supervisor, with which all other application processes registered
>      by name (at which time monitor/2 and/or link/1 were called). This
>      additional process then listened for EXIT signals from registered
>      processes and recorded their crash info. Since the supervisor is already
>      setup to receive all the crash info adding another process to duplicate
>      the functionality seemed silly to me.
>