[erlang-questions] Understanding supervisor / start_link behaviour

Steve Strong steve@REDACTED
Wed Jun 1 21:26:39 CEST 2011


Hi,

I've got some strange behaviour with gen_event within a supervision tree which I don't fully understand. Consider the following supervisor (completely standard, feel free to skip over):

<snip>

-module(sup).
-behaviour(supervisor).
-export([start_link/0, init/1]).
-define(SERVER, ?MODULE).

start_link() ->
 supervisor:start_link({local, ?SERVER}, ?MODULE, []).

init([]) ->
 Child1 = {child, {child, start_link, []}, permanent, 2000, worker, [child]},
 {ok, {{one_for_all, 1000, 3600}, [Child1]}}.

</snip>

and corresponding gen_server (interesting code in bold):

<snip>

-module(child).
-behaviour(gen_server).
-export([start_link/0, init/1, handle_call/3, handle_cast/2, 
 handle_info/2, terminate/2, code_change/3]).

start_link() ->
 gen_server:start_link({local, child}, child, [], []).

init([]) ->
 io:format("about to start gen_event~n"),
 X = gen_event:start_link({local, my_gen_event}),
 io:format("gen_event started with ~p~n", [X]),
 {ok, _Pid} = X,

 {ok, {}, 2000}.

handle_call(_Request, _From, State) ->
 {reply, ok, State}.

handle_cast(_Msg, State) ->
 {noreply, State}.

handle_info(_Info, State) ->
 io:format("about to crash...~n"),
 1 = 2,
 {noreply, State}.

terminate(_Reason, _State) ->
 ok.

code_change(_OldVsn, State, _Extra) ->
 {ok, State}.

</snip>

If I run this from an erl shell like this:

<snip>

--> erl
Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.2 (abort with ^G)
1> application:start(sasl), supervisor:start_link(sup, []).


</snip>

Then the supervisor & server start as expected. After 2 seconds the server gets a timeout message and crashes itself; the supervisor obviously spots this and restarts it. Within the init of the gen_server, it also does a start_link on a gen_event process. By my understanding, whenever the gen_server process exits, the gen_event will also be terminated.

However, every now and then I see the following output (a ton of sasl trace omitted for clarity!):

<snip>

about to crash...
about to start gen_event
gen_event started with {error,{already_started,<0.79.0>}}
about to start gen_event
gen_event started with {error,{already_started,<0.79.0>}}
about to start gen_event


</snip>

What is happening is that the gen_server is crashing but on its restart the gen_event process is still running - hence the gen_server fails in its init and gets restarted again. Sometimes this loop clears after a few iterations, other times it can continue until the parent supervisor gives up, packs its bags and goes home.

So, my question is whether this is expected behaviour or not. I assume that the termination of the linked child is happening asynchronously, and that the supervisor is hence restarting its children before things have cleaned up correctly - is that correct?

I can fix this particular scenario by trapping exits within the gen_server, and then calling gen_event:stop within the terminate. Is this type of processing necessary whenever a process is start_link'ed within a supervisor tree, or is what I'm doing considered bad practice?

Thanks for your time,

Steve

-- Steve Strong, Director, id3as
twitter.com/srstrong

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110601/413bcc10/attachment.htm>


More information about the erlang-questions mailing list