[erlang-questions] How to handle a massive amount of UDP packets?

Mon Apr 23 09:56:43 CEST 2012

On 23 Apr 2012, at 08:25, Valentin Nechayev wrote:

>> From: Ulf Wiger <ulf@REDACTED>
> 
>> What one can do is to combine {active, once} with gen_tcp:recv().
>> 
>> Essentially, you will be served the first message, then read as many as you 
>> wish from the socket. When the socket is empty, you can again enable 
>> {active, once}.
> 
> First, the approach you described is quite badly documented. No
> description how such non-waiting recv() can be reached. If this is call
> with Timeout=0, type timeout() isn't defined, and return value for
> timeout isn't defined.  It only defines Reason = closed or
> inet:posix(). But it's incorrect to guess that eagain (or ewouldblock?)
> will be returned, if the implementing code is uniform against timeout
> value except infinity. I dislike to use such undocumented ways.

Huh?

Well, first of all, I wrote gen_tcp:recv() - apologies for that.

I agree that the documentation should say that {error, timeout} is one
of the possible return values, but this is a small oversight - feel free to 
submit a patch. It is by no means an undocumented or unsupported 
behavior. gen_[tcp|udp]:recv() is what you use when you have {active,false}.

type timeout() _is_ defined. It's just that the gen_tcp/gen_udp manual
doesn't tell you where to find it (I agree this is annoying, but if we're 
discussing optimal tuning of live systems, perhaps we can agree that we
shouldn't let bugs in the documentation limit our options?)

Actually, timeout() is a built-in type:

timeout() :: 'infinity' | non_neg_integer()
non_neg_integer() :: 0..

It's documented in the Reference Manual, chapter 6.2
http://www.erlang.org/doc/reference_manual/typespec.html#id74831

> Second, your approach gives useless process switches. If a long message
> is in receiving via TCP, there will be two switches to owner or more -
> the first one for the first part of a message, and some next ones for
> rest of it.  If incoming rate is enough to process each small portion
> (TCP window) separately, owner process will get and process them
> separately; if its and system speed isn't enough for such switching,
> data will group in larger portions. This means that performance
> measuring will be total lie, with three intervals - uselessly quick
> saturation, then stable 100% under wide load interval, and then
> unexpected overloading. It's very hard to diagnose and optimize a
> system with such behavior, and this trend to fill the whole system by
> one subsystem affects other concurrent subsystems in bad way.

One way to look at it is that you go from {active, once} to {active, false}
and stay with {active, false} until you get a timeout. Then switch to 
{active, once} to avoid being locked up in a blocking recv(), which can
be bad for e.g. code updates and reconfigurations. If that particular 
problem doesn't bother you, it may be better to stay in {active, false}
and do a blocking read. Chances are, you _will_ regret this. ;-)

The only switching that goes on is between the port owner and the 
port. Granted, there is a performance penalty in using {active, false}
and {active, true} (much of the 25% difference reported before). OTOH, 
{active, true} can only be used if you are absolutely sure you will not 
kill the system. It completely lacks flow control and effectively disables 
the back-pressure mechanisms in TCP.

Packet loss in UDP _is_ the way for the server to stay alive if it cannot
keep up with the rate of incoming requests. If you can over-provision
your server side so that it cannot be killed by clients (which usually 
cannot be guaranteed), foregoing flow control will surely give the best 
throughput.

In my experience, using UDP in situations where high availability is
required, and overload possible, is a PITA. It's extremely difficult to
achieve a proper overload behavior, if you also want the clients to have 
a predictable experience.

If you want _really_ undocumented, here is one way to get better 
throughput than ({active,false} and gen_tcp:recv/3), but still keep the packets
in the TCP buffer for as long as possible.

(Not showing the other shell, where I'm just connecting and sending the 
messages "one", "two", …, "five").

Eshell V5.9  (abort with ^G)
1> {ok,L} = gen_tcp:listen(8888,[{packet,2}]).
{ok,#Port<0.760>}
2> {ok,S} = gen_tcp:accept(L).
{ok,#Port<0.771>}
3> inet:setopts(S,[{active,false}]).
ok

% Can't use the Length indicator to tell the socket we want as much as possible:
4> gen_tcp:recv(S,1000,1000).
{error,einval}

% With Length = 0, we get exactly one message. This is what you normally do.
5> gen_tcp:recv(S,0,1000).   
{ok,"one"}
6> gen_tcp:recv(S,0,1000).
{ok,"two"}

% This is going directly at the low-level function used by both gen_tcp and gen_udp:
7> [prim_inet:async_recv(S,0,0) || _ <- [1,2,3,4]].
[{ok,3},{ok,4},{ok,5},{ok,6}]
8> flush().
Shell got {inet_async,#Port<0.771>,3,{ok,"three"}}
Shell got {inet_async,#Port<0.771>,4,{ok,"four"}}
Shell got {inet_async,#Port<0.771>,5,{ok,"five"}}
Shell got {inet_async,#Port<0.771>,6,{error,timeout}}

Note to self: the JOBS load regulation system internally figures out a quota of 
jobs for each dispatch. For counter-based regulation, it only supports a fixed
job size per-queue, but it wouldn't be hard to allow a configuration that makes 
the job quota (the 'increment' in the JOBS config) to be dynamic up to a certain
limit. The list of counters and their increment size is already passed on when
the request is granted, and can be inspected by the client. This could be used 
in combination with the above to determine how many messages to receive.

Jesper, if you want to steal back the MVP status, there's one way to get ahead. ;-)

The main trick would be to figure out how to fairly divide the quota among 
several queued requests - and, as always, how this should be expressed in the 
config.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com