running without net tick

Fri Sep 25 09:12:46 CEST 2009

The problem of netsplits in Erlang comes up now and again.
I've mentioned that we used to have a more robust
supervision algorithm for device processor monitoring in
AXD 301...

I read the following comment in kernel/src/dist_util.erl

%% Send a TICK to the other side.
%%
%% This will happen every 15 seconds (by default)
%% The idea here is that every 15 secs, we write a little
%% something on the connection if we haven't written anything for
%% the last 15 secs.
%% This will ensure that nodes that are not responding due to
%% hardware errors (Or being suspended by means of ^Z) will
%% be considered to be down. If we do not want to have this
%% we must start the net_kernel (in erlang) without its
%% ticker process, In that case this code will never run

...and thought: promising - it is then possible to experiment
with other tick algorithms?

However, looking at net_kernel.erl:

init({Name, LongOrShortNames, TickT}) ->
     process_flag(trap_exit,true),
     case init_node(Name, LongOrShortNames) of
         {ok, Node, Listeners} ->
             process_flag(priority, max),
             Ticktime = to_integer(TickT),
             Ticker = spawn_link(net_kernel, ticker, [self(), Ticktime]),

In other words, you can't set net_ticktime to anything other
than an integer (and it has to be a smallint, since it's used
in a receive ... after expression.

(To do justice to the comment above, couldn't a net_ticktime
of, say, 0 turn off net ticking altogether?)

What one can do then, is to set net_ticktime to a very large
number, and then run a user-level heartbeat. If netsplits are
still experienced without visible problems in the user-level
monitoring, or perhaps even serviced traffic during this
interval, then something is definitely wrong with the tick
algorithm. :)

BR,
Ulf W
-- 
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com