[erlang-questions] question about supervisor (fail to restart one of the worker)
Anthony Kong
anthony.hw.kong@REDACTED
Sun Mar 30 05:33:23 CEST 2008
Hi, all,
I hope I can explain my question adequately clear.
I am working on a solution in which a supervisor will monitor a number of
generic workers.
To most messages, the worker will respond with an answer, but it may exit if
certain conditions are true. I want the supervisor to kill all existing
worker and restart them in case of any worker failure. Hence the use of
"all_for_one".
I tried to illustrated this problem with a simplified version of the
implementation. See the attached files.
sup.erl - the supervisor
worker.erl - the worker process. Implemented using gen_server. Whenever
message "stupid_question" is received, it will exit.
Because I want to have several instance of the worker, so,
1) I used an auxiliary function to create an ad-hoc server name for each
instance (process):
start(Name) ->
ServerName = get_worker_id(Name),
gen_server:start({local, ServerName}, ?MODULE, [ServerName], []).
get_worker_id(Id) when is_integer(Id) ->
list_to_atom(?SERVER_FAMILY ++ integer_to_list(Id));
2) I construct the following child process spec in my supervisor
{ok, {{all_for_one, 3, 10},
[{w1,{worker,start_link,[1]},
permanent,brutal_kill,worker,
[worker]},
{w2,{worker,start_link,[2]},
permanent,brutal_kill,worker,
[worker]},
{w3,{worker,start_link,[3]},
permanent,brutal_kill,worker,
[worker]},
{w4,{worker,start_link,[4]},
permanent,brutal_kill,worker,
[worker]}]
}}.
I have defined two test functions in sup.erl to facilitate testing (test1
and test2 respectively). *test1* is for testing in bash shell. *test2* is
for testing within erl.
If I run test1 as such,
erl -pa . -boot start_sasl -s sup test1 -run init stop -config log -noshell
I got the following errors:
in start/0
superviosr PID: <0.37.0>
Asking worker_1 a {good_question}
in worker:start_link/1. Param worker_1
reply: {answer}
Asking worker_1 a {stupid_question}
signal {noproc,{gen_server,call,
[worker_1,{ask_something,{stupid_question}}]}}
Asking worker_1 a {good_question}
{"init terminating in
do_boot",{noproc,{gen_server,call,[worker_1,{ask_something,{good_question}}]}}}
Crash dump was written to: erl_crash.dump
init terminating in do_boot ()
So, from what I can work out, when the worker process is dead after getting
a stupid_question, it is not started by the supervisor at all. Therefore got
a noproc exception when the worker_1 is asked a good_question again.
If I changed the test case to use worker 2 instead of 1, then it is obvious
that the worker 2 to 4 are not started at all.
So, the questions I want to ask are
1) what mistakes I have made in this code?
2) Why worker 1 is not restarted?
3) Why worker 2 to 4 are not started at all?
4) the worker instance and gen_server: I used list_to_atom() to create
unique processes of worker. Is it a valid approach?
Cheers,
Anthony
--
/*--*/
Don't EVER make the mistake that you can design something better than what
you get from ruthless massively parallel trial-and-error with a feedback
cycle. That's giving your intelligence _much_ too much credit.
- Linus Torvalds
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080330/ea50af15/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: worker.erl
Type: application/octet-stream
Size: 1604 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080330/ea50af15/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log.config
Type: application/xml
Size: 349 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080330/ea50af15/attachment.wsdl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sup.erl
Type: application/octet-stream
Size: 1688 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080330/ea50af15/attachment-0001.obj>
More information about the erlang-questions
mailing list