[erlang-questions] running without net tick

Jayson Vantuyl kagato@REDACTED
Fri Sep 25 12:25:29 CEST 2009


Short Version:

Why not open a special "tick" TCP port?  UDP would require a reliable  
delivery implementation.  TCP saves quite a bit of work in that regard  
(and gets a lot of important but subtle things right).

Long Version:

Also, never say never.

Actually, you CAN send out-of-band data (also called urgent data)  
using TCP.  The original "WinNuke" (i.e. ping-of-death for Windows 95)  
was due to having a corrupt OOB header in a TCP packet.  In classic  
Microsoft / Internet style, the issue was further confused because it  
was an Out-of-Bounds bug, so a generation of networking consultants  
have minor deviations in their interpretations of the meaning of the  
letters OOB.

As for TCP Urgent Data / OOB, it seems to be specified well enough at  
the protocol level, but iit doesn't appear to be handled uniformly in  
different socket implementations.

Under Linux, you use send/recv with the MSG_OOB option (or set the  
SO_OOBINLINE socket option to just inline the data).  It appears to  
try to keep it at a certain point in the data stream (i.e. to preserve  
some of the ordering) and certain conditions can cause it to become  
part of the "normal" stream of data.  It also can cause some odd  
signals to be delivered to the process.  Still, TCP *does* have OOB  
data support, just maybe it isn't easily usable everywhere.

On Sep 25, 2009, at 3:04 AM, Valentin Micic wrote:

> You may change TICK value all day long, but if the underlying  
> infrastructure
> s in some kind of trouble, that alone is not going to solve the  
> problem.
>
> The following is just a speculation, but quite plausible in my mind:
>
> AFAIK, ERTS is multiplexing inter-nodal traffic over a single  
> socket. Thus,
> if the socket is heavily utilized, the sending buffer may get  
> congested due
> to dynamically reduced TCP window size (because remote side is not  
> flushing
> its buffer fast enough -- if the same process is reading and writing  
> the
> socket, this may cause a deadlock under a heavy load). As much as I  
> am not
> certain about particular implementation here, I know that sender  
> will not
> wait for ever -- it will eventually timeout and this (exception?)  
> has to be
> handled somehow by the sender. The reasonable course of action would  
> be to
> reset the connection. If and when that happens, node can be declared
> unreachable; therefore the "net-split" may occur. In other words,  
> net-split
> may occur with or without "ticker" process running and regardless of  
> the
> real network availability (*).
>
>
> I think the net-tick method is good on its own, however, it is  
> utilizing a
> *wrong* transport! IMO, tick should be handled as out-of-band data,  
> and this
> cannot be done using TCP/IP (well, at least not at the user level). My
> suggestion would be to use UDP for net-kernel communication  
> (including TICK
> messages). This way one would be able to find out about peer health  
> more
> reliably (yes, a small protocol may be required, but that's relatively
> easy).
>
> To make things simpler regarding the distribution, one may use the  
> same port
> number as advertised in EPMD for a particular node, hence bind UDP  
> socket to
> that number.
>
> V/
>
> (*) I've seen "net-splits" between nodes collocated on the same  
> machine --
> therefore indicating TCP buffer/load related issue. Maybe situation  
> may be
> improved by creation of more than one connection between two nodes,  
> but that
> may come with a bag of problems on its own.
>
>
> -----Original Message-----
> From: erlang-questions@REDACTED [mailto:erlang- 
> questions@REDACTED] On
> Behalf Of Ulf Wiger
> Sent: 25 September 2009 09:13 AM
> To: erlang-questions Questions
> Subject: [erlang-questions] running without net tick
>
>
> The problem of netsplits in Erlang comes up now and again.
> I've mentioned that we used to have a more robust
> supervision algorithm for device processor monitoring in
> AXD 301...
>
> I read the following comment in kernel/src/dist_util.erl
>
> %% Send a TICK to the other side.
> %%
> %% This will happen every 15 seconds (by default)
> %% The idea here is that every 15 secs, we write a little
> %% something on the connection if we haven't written anything for
> %% the last 15 secs.
> %% This will ensure that nodes that are not responding due to
> %% hardware errors (Or being suspended by means of ^Z) will
> %% be considered to be down. If we do not want to have this
> %% we must start the net_kernel (in erlang) without its
> %% ticker process, In that case this code will never run
>
>
> ...and thought: promising - it is then possible to experiment
> with other tick algorithms?
>
> However, looking at net_kernel.erl:
>
> init({Name, LongOrShortNames, TickT}) ->
>     process_flag(trap_exit,true),
>     case init_node(Name, LongOrShortNames) of
>         {ok, Node, Listeners} ->
>             process_flag(priority, max),
>             Ticktime = to_integer(TickT),
>             Ticker = spawn_link(net_kernel, ticker, [self(),  
> Ticktime]),
>
> In other words, you can't set net_ticktime to anything other
> than an integer (and it has to be a smallint, since it's used
> in a receive ... after expression.
>
> (To do justice to the comment above, couldn't a net_ticktime
> of, say, 0 turn off net ticking altogether?)
>
> What one can do then, is to set net_ticktime to a very large
> number, and then run a user-level heartbeat. If netsplits are
> still experienced without visible problems in the user-level
> monitoring, or perhaps even serviced traffic during this
> interval, then something is definitely wrong with the tick
> algorithm. :)
>
> BR,
> Ulf W
> -- 
> Ulf Wiger
> CTO, Erlang Training & Consulting Ltd
> http://www.erlang-consulting.com
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>



-- 
Jayson Vantuyl
kagato@REDACTED







More information about the erlang-questions mailing list