[erlang-questions] running without net tick
Valentin Micic
v@REDACTED
Fri Sep 25 13:26:22 CEST 2009
I beg to differ -- my take is that TCP reliability is a part of the problem
in this case. Whilst buffering and flow control is important for, say, file
transfer, it is completely irrelevant for TCIK and health-checks (So what if
it doesn't get there, I can send it again without any consequence!).
Argument about UDP unreliability sounds more like a mantra than a proper
argument (if only I got a penny every time I've heard it (-:). There are
only two fundamental differences (*) between TCP and UDP... actually only
one, because the second is conditioned by the first: TCP supports stream,
whilst UDP message-bound communication; thus, as a consequence, TCP requires
some form of flow control to support stream processing.
In this particular case: what possible benefit one can derive from sending a
message over the stream as opposed to sending just a message? If the message
is short enough to fit in a datagram -- none!
As for ability to send urgent data (OOB) over TCP socket -- data streams and
OOB data are mixing like oil and water. I am yet to see a successful
utilization of OOB (issued by a user) that hasn't resulted in connection
reset (or system shutdown (-;).
Lastly, if TICK is implemented via separate TCP socket, that would double
networking resources required -- you'd need a new socket for every node
you're connected to. With UDP, all you need is one socket, and a very basic
protocol:
1) Ask when you have to;
2) Answer when asked.
Mind you, net-kernel is already doing this.
V/
(*) If one disregards things that UDP can which TCP cannot do, such as a
multi-drop, multicasting, etc.
-----Original Message-----
From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
Behalf Of Jayson Vantuyl
Sent: 25 September 2009 12:25 PM
To: Erlang-Questions Questions
Subject: Re: [erlang-questions] running without net tick
Short Version:
Why not open a special "tick" TCP port? UDP would require a reliable
delivery implementation. TCP saves quite a bit of work in that regard
(and gets a lot of important but subtle things right).
Long Version:
Also, never say never.
Actually, you CAN send out-of-band data (also called urgent data)
using TCP. The original "WinNuke" (i.e. ping-of-death for Windows 95)
was due to having a corrupt OOB header in a TCP packet. In classic
Microsoft / Internet style, the issue was further confused because it
was an Out-of-Bounds bug, so a generation of networking consultants
have minor deviations in their interpretations of the meaning of the
letters OOB.
As for TCP Urgent Data / OOB, it seems to be specified well enough at
the protocol level, but iit doesn't appear to be handled uniformly in
different socket implementations.
Under Linux, you use send/recv with the MSG_OOB option (or set the
SO_OOBINLINE socket option to just inline the data). It appears to
try to keep it at a certain point in the data stream (i.e. to preserve
some of the ordering) and certain conditions can cause it to become
part of the "normal" stream of data. It also can cause some odd
signals to be delivered to the process. Still, TCP *does* have OOB
data support, just maybe it isn't easily usable everywhere.
On Sep 25, 2009, at 3:04 AM, Valentin Micic wrote:
> You may change TICK value all day long, but if the underlying
> infrastructure
> s in some kind of trouble, that alone is not going to solve the
> problem.
>
> The following is just a speculation, but quite plausible in my mind:
>
> AFAIK, ERTS is multiplexing inter-nodal traffic over a single
> socket. Thus,
> if the socket is heavily utilized, the sending buffer may get
> congested due
> to dynamically reduced TCP window size (because remote side is not
> flushing
> its buffer fast enough -- if the same process is reading and writing
> the
> socket, this may cause a deadlock under a heavy load). As much as I
> am not
> certain about particular implementation here, I know that sender
> will not
> wait for ever -- it will eventually timeout and this (exception?)
> has to be
> handled somehow by the sender. The reasonable course of action would
> be to
> reset the connection. If and when that happens, node can be declared
> unreachable; therefore the "net-split" may occur. In other words,
> net-split
> may occur with or without "ticker" process running and regardless of
> the
> real network availability (*).
>
>
> I think the net-tick method is good on its own, however, it is
> utilizing a
> *wrong* transport! IMO, tick should be handled as out-of-band data,
> and this
> cannot be done using TCP/IP (well, at least not at the user level). My
> suggestion would be to use UDP for net-kernel communication
> (including TICK
> messages). This way one would be able to find out about peer health
> more
> reliably (yes, a small protocol may be required, but that's relatively
> easy).
>
> To make things simpler regarding the distribution, one may use the
> same port
> number as advertised in EPMD for a particular node, hence bind UDP
> socket to
> that number.
>
> V/
>
> (*) I've seen "net-splits" between nodes collocated on the same
> machine --
> therefore indicating TCP buffer/load related issue. Maybe situation
> may be
> improved by creation of more than one connection between two nodes,
> but that
> may come with a bag of problems on its own.
>
>
> -----Original Message-----
> From: erlang-questions@REDACTED [mailto:erlang-
> questions@REDACTED] On
> Behalf Of Ulf Wiger
> Sent: 25 September 2009 09:13 AM
> To: erlang-questions Questions
> Subject: [erlang-questions] running without net tick
>
>
> The problem of netsplits in Erlang comes up now and again.
> I've mentioned that we used to have a more robust
> supervision algorithm for device processor monitoring in
> AXD 301...
>
> I read the following comment in kernel/src/dist_util.erl
>
> %% Send a TICK to the other side.
> %%
> %% This will happen every 15 seconds (by default)
> %% The idea here is that every 15 secs, we write a little
> %% something on the connection if we haven't written anything for
> %% the last 15 secs.
> %% This will ensure that nodes that are not responding due to
> %% hardware errors (Or being suspended by means of ^Z) will
> %% be considered to be down. If we do not want to have this
> %% we must start the net_kernel (in erlang) without its
> %% ticker process, In that case this code will never run
>
>
> ...and thought: promising - it is then possible to experiment
> with other tick algorithms?
>
> However, looking at net_kernel.erl:
>
> init({Name, LongOrShortNames, TickT}) ->
> process_flag(trap_exit,true),
> case init_node(Name, LongOrShortNames) of
> {ok, Node, Listeners} ->
> process_flag(priority, max),
> Ticktime = to_integer(TickT),
> Ticker = spawn_link(net_kernel, ticker, [self(),
> Ticktime]),
>
> In other words, you can't set net_ticktime to anything other
> than an integer (and it has to be a smallint, since it's used
> in a receive ... after expression.
>
> (To do justice to the comment above, couldn't a net_ticktime
> of, say, 0 turn off net ticking altogether?)
>
> What one can do then, is to set net_ticktime to a very large
> number, and then run a user-level heartbeat. If netsplits are
> still experienced without visible problems in the user-level
> monitoring, or perhaps even serviced traffic during this
> interval, then something is definitely wrong with the tick
> algorithm. :)
>
> BR,
> Ulf W
> --
> Ulf Wiger
> CTO, Erlang Training & Consulting Ltd
> http://www.erlang-consulting.com
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
--
Jayson Vantuyl
kagato@REDACTED
________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org
More information about the erlang-questions
mailing list