[erlang-questions] running without net tick

Sat Sep 26 02:53:17 CEST 2009

Ticks do ticking :-).

The following is from Erlang's documentation:

"Once every TickTime/4 second, all connected nodes are ticked (if anything
else has been written to a node) and if nothing has been received from
another node within the last four (4) tick times that node is considered to
be down. This ensures that nodes which are not responding, for reasons such
as hardware errors, are considered to be down."

Reading this, I would never conclude that losing one tick would cause a node
to be considered unreachable -- primarily because tick cannot be lost, as it
uses reliable connection (TCP) to talk to its peer. 

What I was saying in reply to Ulf's original post was that I've seen nodes
that were collocated on the same machine ending-up net-splitting (?!?), and
that cannot be explained by network problem or packet loss to that matter
nor any change to tick values may influence. 

You keep mentioning head-of-line blocking, well, double it and you'll see
how severe it can be -- I cannot receive anything because process
responsible for receiving is busy blocking on send operation (*); because
peer cannot receive anything as it is busy blocking on its send operation. 
It pretty much resembles this conversation :-).
And I've just timed-out!

V/

(*) Can anyone confirm that a single process is handling both sending and
receiving functions? If this is, indeed, the case -- it would be a good idea
to split sending and receiving functions into separate processes, as the
asynchronous nature of Erlang distribution mechanism lend itself to a
deadlock of this kind.

-----Original Message-----
From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
Behalf Of Jayson Vantuyl
Sent: 26 September 2009 01:23 AM
To: Valentin Micic
Cc: 'Erlang-Questions Questions'
Subject: Re: [erlang-questions] running without net tick

What do you think that the ticks do?  If you lose one, then a node is  
considered lost.  Without a retry strategy that is drastically less  
than the tick time (i.e. the one currently handled by TCP), you will  
see splits.  If you just use UDP, then ANY packet loss will cause a  
split.  If you are using UDP, then you have to compensate for this,  
i.e. engineer a retry scheme.  Minimally, you have to at least have a  
tick interval that might (for example) be shorter than the split  
interval.

As we are talking about quite a few nodes, you will have retry  
synchronization issues.  This problem is as old as the Internet.  UDP  
saves bytes and avoids head-of-line blocking, that's about it.  With a  
dedicated TCP connection, there is no head-of-line blocking.  And,  
unlike a retry model with UDP, you don't have to maintain TCP across a  
few different OSes, and I strongly suggest that it solves this problem  
in a way that is less wasteful than you seem to think.

As for the ball-point pen example, it's a really good example of  
what's going on.

When all they had were pencils, NASA and the Russians both just used  
pencils.  They developed a pen because the pencils were a safety  
hazard, and then the Americans and Russians used them, because they  
solved a real problem.  Just like the people behind that particular  
(false, http://www.snopes.com/business/genius/spacepen.asp) urban  
legend, you don't seem to understand the problem-space.  Unless I'm  
missing something about the net_kernel implementation, the only upside  
of UDP is saving a few bytes, at the expense of complexity that must  
be maintained at the core of a fault-tolerant system.

On Sep 25, 2009, at 3:53 PM, Valentin Micic wrote:

> There is an urban legend saying that NASA invested a good few million
> dollars in order to develop a ballpoint pen that can work in zero  
> gravity.
> Russians used pencil. As much as I don't think that Russian pencil  
> should be
> used for signing a nuclear disarmament treaty, it will work just  
> fine for a
> logbook keeping.
>
> Why would I need to implement a backoff? What horrible behavior you're
> talking about? For goodness sake, we're just talking about loosing  
> packets
> that we can afford to lose! Packet loss *is* simple if a problem we're
> trying to solve is simple. OTOH, if we apply a complex solution to a  
> simple
> problem, we are elevating the problem to the same level of  
> complexity as
> given by the solution. By indiscriminately applying experience  
> related to a
> complex problem in a context of a simple problem, you are implying  
> that
> simple solutions, hence, problems do not exist.
>
> Clearly not the case -- I would choose UDP over TCP for  
> implementation of
> management protocol any day of the week and twice on Sunday. Not  
> because I
> am stubborn, but because it worked for me in the past. And still does.
>
>
> V/
>
> -----Original Message-----
> From: erlang-questions@REDACTED [mailto:erlang- 
> questions@REDACTED] On
> Behalf Of Jayson Vantuyl
> Sent: 25 September 2009 10:52 PM
> To: Erlang-Questions Questions
> Subject: Re: [erlang-questions] running without net tick
>
> Packet loss is not as simple as you think.  With UDP, you either need
> to roll a whole backoff implementation or accept that you're going to
> have horrible behavior under even modest amounts of packet loss.
>
> Fragmentation of packets is also particularly mean.  It can amplify
> latency spectacularly when enough packets are involved.  TCP avoids
> this.  With UDP, you have to roll it for yourself (and you'll need an
> ICMP socket to do so, which usually means running as root on Unix).
> Ticks might be small enough that this isn't necessary, so you might
> get out of this one for free.
>
> Congestion control is problematic, and TCP implementations have a
> number of answers for this that can be tweaked at the OS.  UDP does  
> not.
>
> I'm not suggesting that the old data is useful.  Just throw it away
> and have a way to detect that it's old.  I am suggesting that having
> TCP handle the retries (and letting windowing / retransmission handle
> the trouble for you) is less developer work and behaves better.
>
> TCP having retransmission does not help you make a single decision,
> no.  Instead, it prevents purely random losses (which are VERY common
> in environments like EC2, when network links get saturated) from
> causing all problems to be amplified to a multiple of your tick time,
> in a ratio that gets exponentially compared to the fraction of lost
> packets.
>
> Implement it using UDP.  Implement one using TCP.  Compare the
> difference in behavior.  Notice how the TCP one is simpler, behaves
> better, and has similar bandwidth consumption.  You're obviously going
> to have to learn it through experimentation.  That's fine, as that's
> how I learned it.  I encourage you to keep an open mind, take
> measurements, and consider that every line of code may someday give a
> bug a happy home.
>
> Good luck.
>
> On Sep 25, 2009, at 1:25 PM, Valentin Micic wrote:
>
>> Let see... say, you send a request over TCP for which you're
>> expecting a
>> reply -- and nothing happens. The fact that TCP has a retransmission
>> mechanism (to ensure a reliable delivery) does not help you make any
>> meaningful decision at this point, does it? So, what you're going to
>> do when
>> this happens? Give up? Retry? Whichever way you slice it, you cannot
>> get
>> away form implementing some kind of application level protocol to
>> handle
>> such a condition.
>>
>> Considering this, UDP makes us work more... how exactly???
>>
>> (Actually, sometimes TCP may make it worse. If your request times
>> out and
>> you do not tear down the connection over which request has been
>> made, your
>> request will be delivered although there will be no one interested
>> in a
>> reply. Worse yet, you may be issuing another request, etc. I say,
>> sometimes
>> you're better off just reliably losing the whole thing. <-:)
>>
>> Anyway, let's agree to disagree on this one.
>>
>> V/
>>
>> -----Original Message-----
>> From: Jayson Vantuyl [mailto:kagato@REDACTED]
>> Sent: 25 September 2009 08:53 PM
>> To: Valentin Micic
>> Cc: 'Erlang-Questions Questions'
>> Subject: Re: [erlang-questions] running without net tick
>>
>> I completely agree on the oil-and-water statement.  That said, TCP
>> supports OOB, it's just a bad idea to use it.
>>
>> The theory of "why it's not working" that was mentioned was that the
>> other data multiplexed over the stream was choking out the ticks.
>>
>> A dedicated connection makes that a non-issue.  It's the classic  
>> head-
>> of-line blocking problem.
>>
>> Let's say that ticks were reduced to a 4-byte timestamp (to give some
>> reference point if the connection is broken and re-established).
>> Let's say you send them every 10 seconds.  	TCP has a 40-byte
>> overhead.  Ethernet usually has a 1500-byte MTU.  That makes room for
>> an hour's worth of ticks in a single TCP packet over average ethernet
>> (and probably at least 20-minutes worth over any useable MTU).
>>
>> If we are limited to IP, want to have any chance to make it through a
>> firewall, want timely retries, and want a tick to generate one-packet
>> (or less aggregate), we either have UDP or TCP.  In the above case,
>> TCP generates about as much packet traffic as UDP and is reasonably
>> close to timely.  The packets are larger and the retries are wasted,
>> but they also back off exponentially.  A dedicated connection does  
>> not
>> have the "head-of-line" problem due to multiplexing (which is,
>> admittedly, unproven as the problem).
>>
>> The point of this exercise is that, unless you're going over a very
>> small or very latent pipe, UDP doesn't really give us anything other
>> than more work.  Why more work, you say?  Because UDP doesn't retry.
>> Sending a tick every 10 seconds over TCP is not the same thing as
>> sending a tick every 10 seconds over UDP.  Why?  Assume 75%, random
>> packet loss.  That means you're likely to get a single UDP tick every
>> 40 seconds.  With TCP, the automatic retry will turn that into
>> approximately one tick every 10.X seconds, where X is entirely
>> dependent on latency (and probably very small).  Does TCP do this  
>> with
>> more traffic?  Yes.  However, it does it with exponential backoff, a
>> window to limit the outstanding number of packets, PMTU support so
>> that the packets don't get fragmented, an RST mechanism to break
>> connections if the remote host has rebooted, the option to use SSL to
>> encrypt the session, etc.
>>
>> There are almost no cases that actually demand UDP that a single TCP
>> connection doesn't do very well.  I'd strongly recommend not ignoring
>> its benefits and realizing that real network conditions almost never
>> favor UDP and that UDP does not favor a simple implementation.
>>
>> On Sep 25, 2009, at 4:26 AM, Valentin Micic wrote:
>>
>>> I beg to differ -- my take is that TCP reliability is a part of the
>>> problem
>>> in this case. Whilst buffering and flow control is important for,
>>> say, file
>>> transfer, it is completely irrelevant for TCIK and health-checks (So
>>> what if
>>> it doesn't get there, I can send it again without any consequence!).
>>>
>>> Argument about UDP unreliability sounds more like a mantra than a
>>> proper
>>> argument (if only I got a penny every time I've heard it (-:). There
>>> are
>>> only two fundamental differences (*) between TCP and UDP... actually
>>> only
>>> one, because the second is conditioned by the first: TCP supports
>>> stream,
>>> whilst UDP message-bound communication; thus, as a consequence, TCP
>>> requires
>>> some form of flow control to support stream processing.
>>>
>>> In this particular case: what possible benefit one can derive from
>>> sending a
>>> message over the stream as opposed to sending just a message? If the
>>> message
>>> is short enough to fit in a datagram -- none!
>>>
>>> As for ability to send urgent data (OOB) over TCP socket -- data
>>> streams and
>>> OOB data are mixing like oil and water. I am yet to see a successful
>>> utilization of OOB (issued by a user) that hasn't resulted in
>>> connection
>>> reset (or system shutdown (-;).
>>>
>>> Lastly, if TICK is implemented via separate TCP socket, that would
>>> double
>>> networking resources required -- you'd need a new socket for every
>>> node
>>> you're connected to. With UDP, all you need is one socket, and a
>>> very basic
>>> protocol:
>>>
>>> 	1) Ask when you have to;
>>> 	2) Answer when asked.
>>>
>>> Mind you, net-kernel is already doing this.
>>>
>>> V/
>>>
>>> (*) If one disregards things that UDP can which TCP cannot do, such
>>> as a
>>> multi-drop, multicasting, etc.
>>>
>>> -----Original Message-----
>>> From: erlang-questions@REDACTED [mailto:erlang-
>>> questions@REDACTED] On
>>> Behalf Of Jayson Vantuyl
>>> Sent: 25 September 2009 12:25 PM
>>> To: Erlang-Questions Questions
>>> Subject: Re: [erlang-questions] running without net tick
>>>
>>> Short Version:
>>>
>>> Why not open a special "tick" TCP port?  UDP would require a  
>>> reliable
>>> delivery implementation.  TCP saves quite a bit of work in that
>>> regard
>>> (and gets a lot of important but subtle things right).
>>>
>>> Long Version:
>>>
>>> Also, never say never.
>>>
>>> Actually, you CAN send out-of-band data (also called urgent data)
>>> using TCP.  The original "WinNuke" (i.e. ping-of-death for Windows
>>> 95)
>>> was due to having a corrupt OOB header in a TCP packet.  In classic
>>> Microsoft / Internet style, the issue was further confused because  
>>> it
>>> was an Out-of-Bounds bug, so a generation of networking consultants
>>> have minor deviations in their interpretations of the meaning of the
>>> letters OOB.
>>>
>>> As for TCP Urgent Data / OOB, it seems to be specified well enough  
>>> at
>>> the protocol level, but iit doesn't appear to be handled uniformly  
>>> in
>>> different socket implementations.
>>>
>>> Under Linux, you use send/recv with the MSG_OOB option (or set the
>>> SO_OOBINLINE socket option to just inline the data).  It appears to
>>> try to keep it at a certain point in the data stream (i.e. to
>>> preserve
>>> some of the ordering) and certain conditions can cause it to become
>>> part of the "normal" stream of data.  It also can cause some odd
>>> signals to be delivered to the process.  Still, TCP *does* have OOB
>>> data support, just maybe it isn't easily usable everywhere.
>>>
>>> On Sep 25, 2009, at 3:04 AM, Valentin Micic wrote:
>>>
>>>> You may change TICK value all day long, but if the underlying
>>>> infrastructure
>>>> s in some kind of trouble, that alone is not going to solve the
>>>> problem.
>>>>
>>>> The following is just a speculation, but quite plausible in my  
>>>> mind:
>>>>
>>>> AFAIK, ERTS is multiplexing inter-nodal traffic over a single
>>>> socket. Thus,
>>>> if the socket is heavily utilized, the sending buffer may get
>>>> congested due
>>>> to dynamically reduced TCP window size (because remote side is not
>>>> flushing
>>>> its buffer fast enough -- if the same process is reading and  
>>>> writing
>>>> the
>>>> socket, this may cause a deadlock under a heavy load). As much as I
>>>> am not
>>>> certain about particular implementation here, I know that sender
>>>> will not
>>>> wait for ever -- it will eventually timeout and this (exception?)
>>>> has to be
>>>> handled somehow by the sender. The reasonable course of action  
>>>> would
>>>> be to
>>>> reset the connection. If and when that happens, node can be  
>>>> declared
>>>> unreachable; therefore the "net-split" may occur. In other words,
>>>> net-split
>>>> may occur with or without "ticker" process running and regardless  
>>>> of
>>>> the
>>>> real network availability (*).
>>>>
>>>>
>>>> I think the net-tick method is good on its own, however, it is
>>>> utilizing a
>>>> *wrong* transport! IMO, tick should be handled as out-of-band data,
>>>> and this
>>>> cannot be done using TCP/IP (well, at least not at the user level).
>>>> My
>>>> suggestion would be to use UDP for net-kernel communication
>>>> (including TICK
>>>> messages). This way one would be able to find out about peer health
>>>> more
>>>> reliably (yes, a small protocol may be required, but that's
>>>> relatively
>>>> easy).
>>>>
>>>> To make things simpler regarding the distribution, one may use the
>>>> same port
>>>> number as advertised in EPMD for a particular node, hence bind UDP
>>>> socket to
>>>> that number.
>>>>
>>>> V/
>>>>
>>>> (*) I've seen "net-splits" between nodes collocated on the same
>>>> machine --
>>>> therefore indicating TCP buffer/load related issue. Maybe situation
>>>> may be
>>>> improved by creation of more than one connection between two nodes,
>>>> but that
>>>> may come with a bag of problems on its own.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: erlang-questions@REDACTED [mailto:erlang-
>>>> questions@REDACTED] On
>>>> Behalf Of Ulf Wiger
>>>> Sent: 25 September 2009 09:13 AM
>>>> To: erlang-questions Questions
>>>> Subject: [erlang-questions] running without net tick
>>>>
>>>>
>>>> The problem of netsplits in Erlang comes up now and again.
>>>> I've mentioned that we used to have a more robust
>>>> supervision algorithm for device processor monitoring in
>>>> AXD 301...
>>>>
>>>> I read the following comment in kernel/src/dist_util.erl
>>>>
>>>> %% Send a TICK to the other side.
>>>> %%
>>>> %% This will happen every 15 seconds (by default)
>>>> %% The idea here is that every 15 secs, we write a little
>>>> %% something on the connection if we haven't written anything for
>>>> %% the last 15 secs.
>>>> %% This will ensure that nodes that are not responding due to
>>>> %% hardware errors (Or being suspended by means of ^Z) will
>>>> %% be considered to be down. If we do not want to have this
>>>> %% we must start the net_kernel (in erlang) without its
>>>> %% ticker process, In that case this code will never run
>>>>
>>>>
>>>> ...and thought: promising - it is then possible to experiment
>>>> with other tick algorithms?
>>>>
>>>> However, looking at net_kernel.erl:
>>>>
>>>> init({Name, LongOrShortNames, TickT}) ->
>>>>  process_flag(trap_exit,true),
>>>>  case init_node(Name, LongOrShortNames) of
>>>>      {ok, Node, Listeners} ->
>>>>          process_flag(priority, max),
>>>>          Ticktime = to_integer(TickT),
>>>>          Ticker = spawn_link(net_kernel, ticker, [self(),
>>>> Ticktime]),
>>>>
>>>> In other words, you can't set net_ticktime to anything other
>>>> than an integer (and it has to be a smallint, since it's used
>>>> in a receive ... after expression.
>>>>
>>>> (To do justice to the comment above, couldn't a net_ticktime
>>>> of, say, 0 turn off net ticking altogether?)
>>>>
>>>> What one can do then, is to set net_ticktime to a very large
>>>> number, and then run a user-level heartbeat. If netsplits are
>>>> still experienced without visible problems in the user-level
>>>> monitoring, or perhaps even serviced traffic during this
>>>> interval, then something is definitely wrong with the tick
>>>> algorithm. :)
>>>>
>>>> BR,
>>>> Ulf W
>>>> -- 
>>>> Ulf Wiger
>>>> CTO, Erlang Training & Consulting Ltd
>>>> http://www.erlang-consulting.com
>>>>
>>>> ________________________________________________________________
>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>> erlang-questions (at) erlang.org
>>>>
>>>>
>>>> ________________________________________________________________
>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>> erlang-questions (at) erlang.org
>>>>
>>>
>>>
>>>
>>> -- 
>>> Jayson Vantuyl
>>> kagato@REDACTED
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________________________________
>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>> erlang-questions (at) erlang.org
>>>
>>
>>
>>
>> -- 
>> Jayson Vantuyl
>> kagato@REDACTED
>>
>>
>>
>>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>