running without net tick
Ulf Wiger
ulf.wiger@REDACTED
Fri Sep 25 09:12:46 CEST 2009
The problem of netsplits in Erlang comes up now and again.
I've mentioned that we used to have a more robust
supervision algorithm for device processor monitoring in
AXD 301...
I read the following comment in kernel/src/dist_util.erl
%% Send a TICK to the other side.
%%
%% This will happen every 15 seconds (by default)
%% The idea here is that every 15 secs, we write a little
%% something on the connection if we haven't written anything for
%% the last 15 secs.
%% This will ensure that nodes that are not responding due to
%% hardware errors (Or being suspended by means of ^Z) will
%% be considered to be down. If we do not want to have this
%% we must start the net_kernel (in erlang) without its
%% ticker process, In that case this code will never run
...and thought: promising - it is then possible to experiment
with other tick algorithms?
However, looking at net_kernel.erl:
init({Name, LongOrShortNames, TickT}) ->
process_flag(trap_exit,true),
case init_node(Name, LongOrShortNames) of
{ok, Node, Listeners} ->
process_flag(priority, max),
Ticktime = to_integer(TickT),
Ticker = spawn_link(net_kernel, ticker, [self(), Ticktime]),
In other words, you can't set net_ticktime to anything other
than an integer (and it has to be a smallint, since it's used
in a receive ... after expression.
(To do justice to the comment above, couldn't a net_ticktime
of, say, 0 turn off net ticking altogether?)
What one can do then, is to set net_ticktime to a very large
number, and then run a user-level heartbeat. If netsplits are
still experienced without visible problems in the user-level
monitoring, or perhaps even serviced traffic during this
interval, then something is definitely wrong with the tick
algorithm. :)
BR,
Ulf W
--
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com
More information about the erlang-questions
mailing list