[erlang-questions] running without net tick
Jayson Vantuyl
kagato@REDACTED
Fri Sep 25 22:52:00 CEST 2009
Packet loss is not as simple as you think. With UDP, you either need
to roll a whole backoff implementation or accept that you're going to
have horrible behavior under even modest amounts of packet loss.
Fragmentation of packets is also particularly mean. It can amplify
latency spectacularly when enough packets are involved. TCP avoids
this. With UDP, you have to roll it for yourself (and you'll need an
ICMP socket to do so, which usually means running as root on Unix).
Ticks might be small enough that this isn't necessary, so you might
get out of this one for free.
Congestion control is problematic, and TCP implementations have a
number of answers for this that can be tweaked at the OS. UDP does not.
I'm not suggesting that the old data is useful. Just throw it away
and have a way to detect that it's old. I am suggesting that having
TCP handle the retries (and letting windowing / retransmission handle
the trouble for you) is less developer work and behaves better.
TCP having retransmission does not help you make a single decision,
no. Instead, it prevents purely random losses (which are VERY common
in environments like EC2, when network links get saturated) from
causing all problems to be amplified to a multiple of your tick time,
in a ratio that gets exponentially compared to the fraction of lost
packets.
Implement it using UDP. Implement one using TCP. Compare the
difference in behavior. Notice how the TCP one is simpler, behaves
better, and has similar bandwidth consumption. You're obviously going
to have to learn it through experimentation. That's fine, as that's
how I learned it. I encourage you to keep an open mind, take
measurements, and consider that every line of code may someday give a
bug a happy home.
Good luck.
On Sep 25, 2009, at 1:25 PM, Valentin Micic wrote:
> Let see... say, you send a request over TCP for which you're
> expecting a
> reply -- and nothing happens. The fact that TCP has a retransmission
> mechanism (to ensure a reliable delivery) does not help you make any
> meaningful decision at this point, does it? So, what you're going to
> do when
> this happens? Give up? Retry? Whichever way you slice it, you cannot
> get
> away form implementing some kind of application level protocol to
> handle
> such a condition.
>
> Considering this, UDP makes us work more... how exactly???
>
> (Actually, sometimes TCP may make it worse. If your request times
> out and
> you do not tear down the connection over which request has been
> made, your
> request will be delivered although there will be no one interested
> in a
> reply. Worse yet, you may be issuing another request, etc. I say,
> sometimes
> you're better off just reliably losing the whole thing. <-:)
>
> Anyway, let's agree to disagree on this one.
>
> V/
>
> -----Original Message-----
> From: Jayson Vantuyl [mailto:kagato@REDACTED]
> Sent: 25 September 2009 08:53 PM
> To: Valentin Micic
> Cc: 'Erlang-Questions Questions'
> Subject: Re: [erlang-questions] running without net tick
>
> I completely agree on the oil-and-water statement. That said, TCP
> supports OOB, it's just a bad idea to use it.
>
> The theory of "why it's not working" that was mentioned was that the
> other data multiplexed over the stream was choking out the ticks.
>
> A dedicated connection makes that a non-issue. It's the classic head-
> of-line blocking problem.
>
> Let's say that ticks were reduced to a 4-byte timestamp (to give some
> reference point if the connection is broken and re-established).
> Let's say you send them every 10 seconds. TCP has a 40-byte
> overhead. Ethernet usually has a 1500-byte MTU. That makes room for
> an hour's worth of ticks in a single TCP packet over average ethernet
> (and probably at least 20-minutes worth over any useable MTU).
>
> If we are limited to IP, want to have any chance to make it through a
> firewall, want timely retries, and want a tick to generate one-packet
> (or less aggregate), we either have UDP or TCP. In the above case,
> TCP generates about as much packet traffic as UDP and is reasonably
> close to timely. The packets are larger and the retries are wasted,
> but they also back off exponentially. A dedicated connection does not
> have the "head-of-line" problem due to multiplexing (which is,
> admittedly, unproven as the problem).
>
> The point of this exercise is that, unless you're going over a very
> small or very latent pipe, UDP doesn't really give us anything other
> than more work. Why more work, you say? Because UDP doesn't retry.
> Sending a tick every 10 seconds over TCP is not the same thing as
> sending a tick every 10 seconds over UDP. Why? Assume 75%, random
> packet loss. That means you're likely to get a single UDP tick every
> 40 seconds. With TCP, the automatic retry will turn that into
> approximately one tick every 10.X seconds, where X is entirely
> dependent on latency (and probably very small). Does TCP do this with
> more traffic? Yes. However, it does it with exponential backoff, a
> window to limit the outstanding number of packets, PMTU support so
> that the packets don't get fragmented, an RST mechanism to break
> connections if the remote host has rebooted, the option to use SSL to
> encrypt the session, etc.
>
> There are almost no cases that actually demand UDP that a single TCP
> connection doesn't do very well. I'd strongly recommend not ignoring
> its benefits and realizing that real network conditions almost never
> favor UDP and that UDP does not favor a simple implementation.
>
> On Sep 25, 2009, at 4:26 AM, Valentin Micic wrote:
>
>> I beg to differ -- my take is that TCP reliability is a part of the
>> problem
>> in this case. Whilst buffering and flow control is important for,
>> say, file
>> transfer, it is completely irrelevant for TCIK and health-checks (So
>> what if
>> it doesn't get there, I can send it again without any consequence!).
>>
>> Argument about UDP unreliability sounds more like a mantra than a
>> proper
>> argument (if only I got a penny every time I've heard it (-:). There
>> are
>> only two fundamental differences (*) between TCP and UDP... actually
>> only
>> one, because the second is conditioned by the first: TCP supports
>> stream,
>> whilst UDP message-bound communication; thus, as a consequence, TCP
>> requires
>> some form of flow control to support stream processing.
>>
>> In this particular case: what possible benefit one can derive from
>> sending a
>> message over the stream as opposed to sending just a message? If the
>> message
>> is short enough to fit in a datagram -- none!
>>
>> As for ability to send urgent data (OOB) over TCP socket -- data
>> streams and
>> OOB data are mixing like oil and water. I am yet to see a successful
>> utilization of OOB (issued by a user) that hasn't resulted in
>> connection
>> reset (or system shutdown (-;).
>>
>> Lastly, if TICK is implemented via separate TCP socket, that would
>> double
>> networking resources required -- you'd need a new socket for every
>> node
>> you're connected to. With UDP, all you need is one socket, and a
>> very basic
>> protocol:
>>
>> 1) Ask when you have to;
>> 2) Answer when asked.
>>
>> Mind you, net-kernel is already doing this.
>>
>> V/
>>
>> (*) If one disregards things that UDP can which TCP cannot do, such
>> as a
>> multi-drop, multicasting, etc.
>>
>> -----Original Message-----
>> From: erlang-questions@REDACTED [mailto:erlang-
>> questions@REDACTED] On
>> Behalf Of Jayson Vantuyl
>> Sent: 25 September 2009 12:25 PM
>> To: Erlang-Questions Questions
>> Subject: Re: [erlang-questions] running without net tick
>>
>> Short Version:
>>
>> Why not open a special "tick" TCP port? UDP would require a reliable
>> delivery implementation. TCP saves quite a bit of work in that
>> regard
>> (and gets a lot of important but subtle things right).
>>
>> Long Version:
>>
>> Also, never say never.
>>
>> Actually, you CAN send out-of-band data (also called urgent data)
>> using TCP. The original "WinNuke" (i.e. ping-of-death for Windows
>> 95)
>> was due to having a corrupt OOB header in a TCP packet. In classic
>> Microsoft / Internet style, the issue was further confused because it
>> was an Out-of-Bounds bug, so a generation of networking consultants
>> have minor deviations in their interpretations of the meaning of the
>> letters OOB.
>>
>> As for TCP Urgent Data / OOB, it seems to be specified well enough at
>> the protocol level, but iit doesn't appear to be handled uniformly in
>> different socket implementations.
>>
>> Under Linux, you use send/recv with the MSG_OOB option (or set the
>> SO_OOBINLINE socket option to just inline the data). It appears to
>> try to keep it at a certain point in the data stream (i.e. to
>> preserve
>> some of the ordering) and certain conditions can cause it to become
>> part of the "normal" stream of data. It also can cause some odd
>> signals to be delivered to the process. Still, TCP *does* have OOB
>> data support, just maybe it isn't easily usable everywhere.
>>
>> On Sep 25, 2009, at 3:04 AM, Valentin Micic wrote:
>>
>>> You may change TICK value all day long, but if the underlying
>>> infrastructure
>>> s in some kind of trouble, that alone is not going to solve the
>>> problem.
>>>
>>> The following is just a speculation, but quite plausible in my mind:
>>>
>>> AFAIK, ERTS is multiplexing inter-nodal traffic over a single
>>> socket. Thus,
>>> if the socket is heavily utilized, the sending buffer may get
>>> congested due
>>> to dynamically reduced TCP window size (because remote side is not
>>> flushing
>>> its buffer fast enough -- if the same process is reading and writing
>>> the
>>> socket, this may cause a deadlock under a heavy load). As much as I
>>> am not
>>> certain about particular implementation here, I know that sender
>>> will not
>>> wait for ever -- it will eventually timeout and this (exception?)
>>> has to be
>>> handled somehow by the sender. The reasonable course of action would
>>> be to
>>> reset the connection. If and when that happens, node can be declared
>>> unreachable; therefore the "net-split" may occur. In other words,
>>> net-split
>>> may occur with or without "ticker" process running and regardless of
>>> the
>>> real network availability (*).
>>>
>>>
>>> I think the net-tick method is good on its own, however, it is
>>> utilizing a
>>> *wrong* transport! IMO, tick should be handled as out-of-band data,
>>> and this
>>> cannot be done using TCP/IP (well, at least not at the user level).
>>> My
>>> suggestion would be to use UDP for net-kernel communication
>>> (including TICK
>>> messages). This way one would be able to find out about peer health
>>> more
>>> reliably (yes, a small protocol may be required, but that's
>>> relatively
>>> easy).
>>>
>>> To make things simpler regarding the distribution, one may use the
>>> same port
>>> number as advertised in EPMD for a particular node, hence bind UDP
>>> socket to
>>> that number.
>>>
>>> V/
>>>
>>> (*) I've seen "net-splits" between nodes collocated on the same
>>> machine --
>>> therefore indicating TCP buffer/load related issue. Maybe situation
>>> may be
>>> improved by creation of more than one connection between two nodes,
>>> but that
>>> may come with a bag of problems on its own.
>>>
>>>
>>> -----Original Message-----
>>> From: erlang-questions@REDACTED [mailto:erlang-
>>> questions@REDACTED] On
>>> Behalf Of Ulf Wiger
>>> Sent: 25 September 2009 09:13 AM
>>> To: erlang-questions Questions
>>> Subject: [erlang-questions] running without net tick
>>>
>>>
>>> The problem of netsplits in Erlang comes up now and again.
>>> I've mentioned that we used to have a more robust
>>> supervision algorithm for device processor monitoring in
>>> AXD 301...
>>>
>>> I read the following comment in kernel/src/dist_util.erl
>>>
>>> %% Send a TICK to the other side.
>>> %%
>>> %% This will happen every 15 seconds (by default)
>>> %% The idea here is that every 15 secs, we write a little
>>> %% something on the connection if we haven't written anything for
>>> %% the last 15 secs.
>>> %% This will ensure that nodes that are not responding due to
>>> %% hardware errors (Or being suspended by means of ^Z) will
>>> %% be considered to be down. If we do not want to have this
>>> %% we must start the net_kernel (in erlang) without its
>>> %% ticker process, In that case this code will never run
>>>
>>>
>>> ...and thought: promising - it is then possible to experiment
>>> with other tick algorithms?
>>>
>>> However, looking at net_kernel.erl:
>>>
>>> init({Name, LongOrShortNames, TickT}) ->
>>> process_flag(trap_exit,true),
>>> case init_node(Name, LongOrShortNames) of
>>> {ok, Node, Listeners} ->
>>> process_flag(priority, max),
>>> Ticktime = to_integer(TickT),
>>> Ticker = spawn_link(net_kernel, ticker, [self(),
>>> Ticktime]),
>>>
>>> In other words, you can't set net_ticktime to anything other
>>> than an integer (and it has to be a smallint, since it's used
>>> in a receive ... after expression.
>>>
>>> (To do justice to the comment above, couldn't a net_ticktime
>>> of, say, 0 turn off net ticking altogether?)
>>>
>>> What one can do then, is to set net_ticktime to a very large
>>> number, and then run a user-level heartbeat. If netsplits are
>>> still experienced without visible problems in the user-level
>>> monitoring, or perhaps even serviced traffic during this
>>> interval, then something is definitely wrong with the tick
>>> algorithm. :)
>>>
>>> BR,
>>> Ulf W
>>> --
>>> Ulf Wiger
>>> CTO, Erlang Training & Consulting Ltd
>>> http://www.erlang-consulting.com
>>>
>>> ________________________________________________________________
>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>> erlang-questions (at) erlang.org
>>>
>>>
>>> ________________________________________________________________
>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>> erlang-questions (at) erlang.org
>>>
>>
>>
>>
>> --
>> Jayson Vantuyl
>> kagato@REDACTED
>>
>>
>>
>>
>>
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>
>
>
> --
> Jayson Vantuyl
> kagato@REDACTED
>
>
>
>
More information about the erlang-questions
mailing list