recording process crash (in supervisor?)
Serge Aleynikov
serge@REDACTED
Thu Sep 29 13:59:23 CEST 2005
Rick,
Even though you don't seem to favor the addition of another event
handler, that is pretty much the only approach of getting custom
handling of crash reports.
As you correctly pointed out when there is a process crash, a supervisor
calls error_logger:error_report/2, which indeed is the candidate for a
custom callback. Such a handler is very simple to implement (see
stdlib's error_logger_tty_h.erl).
What you can do is that you can add another child process to the
supervisor of interest, that can use
gen_event:add_sup_handler(error_logger, YourHandler, Args). The
presence of the child process (with appropriate {gen_event_EXIT,
YourHandler, _} message monitoring) will reinstall this handler in case
of crashes.
What puzzles me about this last approach is that neither error_logger or
SASL use supervised handlers for event reporting to screen. This raises
a rhetorical question: if the implementation code is 100% correct, does
it mean that the process running this code doesn't require a supervisor?
Perhaps someone on the list can share his/her perception on this...
Serge
P.S. In a couple of weeks I am planning to make a contribution (LAMA -
Log and Alarm MAnager) that will demonstrate the use of this principle
for sending all error reports and alarms to syslog / snmp manager.
Rick Pettit wrote:
> I want to record application process crash info (proc_name/date/time/reason)
> in an ETS table which persists as long as the top-level supervisor remains
> alive. I realize I need to create the ETS table from the supervisor in order
> to ensure it persists past all other application process crashes.
>
> What I don't know is if/where there is a hook for recording such information
> from the supervisor. I don't see any supervisor callback which would allow
> for recording of process crash info.
>
> I see supervisor.erl in stdlib appears to log this information to the
> error_logger (when reason is not normal|shutdown):
>
> do_restart(permanent, Reason, Child, State) ->
> report_error(child_terminated, Reason, Child, State#state.name),
> restart(Child, State);
> do_restart(_, normal, Child, State) ->
> NState = state_del_child(Child, State),
> {ok, NState};
> do_restart(_, shutdown, Child, State) ->
> NState = state_del_child(Child, State),
> {ok, NState};
> do_restart(transient, Reason, Child, State) ->
> report_error(child_terminated, Reason, Child, State#state.name),
> restart(Child, State);
> do_restart(temporary, Reason, Child, State) ->
> report_error(child_terminated, Reason, Child, State#state.name),
> NState = state_del_child(Child, State),
> {ok, NState}.
> ...
> ...
> ...
>
> report_error(Error, Reason, Child, SupName) ->
> ErrorMsg = [{supervisor, SupName},
> {errorContext, Error},
> {reason, Reason},
> {offender, extract_child(Child)}],
> error_logger:error_report(supervisor_report, ErrorMsg).
>
> If I want to process crash information (name/date/time/reason) when application
> processes crash is the convention to install a custom handler via
> error_logger:add_report_handler/[12]?
>
> My knee jerk reaction is that it would be awfully nice if the supervisor
> behaviour simply provided a callback for processing process crash info. The
> callback could even be spawn'd if risk of crashing the supervisor in the
> handler was a concern.
>
> Thanks for wading through the rambling--any comments/suggestions are much
> appreciated.
>
> -Rick
>
> P.S. One approach which I have seen work but which seems cumbersome and
> unnecessary involved adding an addition process, under the top-level
> supervisor, with which all other application processes registered
> by name (at which time monitor/2 and/or link/1 were called). This
> additional process then listened for EXIT signals from registered
> processes and recorded their crash info. Since the supervisor is already
> setup to receive all the crash info adding another process to duplicate
> the functionality seemed silly to me.
>
More information about the erlang-questions
mailing list