[erlang-questions] System limit bringing down rex and the VM
Ulf Wiger
ulf.wiger@REDACTED
Fri Sep 10 09:33:27 CEST 2010
On 09/09/2010 07:33 PM, Musumeci, Antonio S wrote:
>
> I'm seeing mnesia, rex and timer_server in my dump. If you
> kill timer_server though it restarts.
Actually, I consider this a bug.
Let's check to see what the result is of killing timer_server.
Eshell V5.7.5 (abort with ^G)
1> F = fun() ->
timer:send_after(15000,self(),hello),
receive
Msg ->
io:fwrite("got ~p~n", [Msg])
end
end.
#Fun<erl_eval.20.67289768>
2> f(P), P = spawn(F), time().
{9,25,48}
got hello
3> time().
{9,26,6}
4> whereis(timer_server).
<0.38.0>
5> f(P), P = spawn(F), time().
{9,26,22}
6> exit(whereis(timer_server),kill).
true
7> whereis(timer_server).
<0.43.0>
8> time().
{9,27,0}
9> process_info(P).
[{current_function,{erl_eval,receive_clauses,6}},
{initial_call,{erlang,apply,2}},
{status,waiting},
{message_queue_len,0},
{messages,[]},
...
So killing timer_server caused it to bounce back, but in the process,
it forgot all outstanding requests, so any processes depending on the
reliable service of the timer server are now left hanging, with no
indication whatsoever that something went wrong.
Personally, I think it would be much better if the timer server would
in fact stay dead, and bring the whole node down with it - that, or
make sure that its dying and restarting is truly transparent. Choosing
a middle way of merely pretending to be robust is the worst possible
choice.
Rather than concluding that the OTP team are incompetent in matters
of robustness (as there is overwhelming evidence that they are
anything but), I'd like to see this as yet another example of how
desperately difficult and dangerous it is to go down the path you're
suggesting. It may seem like a respectful thing to do, but you take
on a very heavy burden, and may well be much more likely to compound
the problem rather than helping it.
BR,
Ulf W
More information about the erlang-questions
mailing list