[erlang-questions] Erlang TCP throughput slowdown

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Tue Mar 19 16:04:51 CET 2019


On Tue, Mar 19, 2019 at 1:35 PM Borja de Regil <borja.deregil@REDACTED>
wrote:

>
> Apart from the increase in latency between sites, no other configuration
> is changed. My initial expectation was that the throughput would stay the
> same, even if the base latency would increase, and that the saturation
> point would be reached at approximately the same number of requests per
> second.
>
>
In your scenarios, this assumption needs amendment. Your client code sends
a single message and then waits for a reply, so at most one message is on
the line at a given time, and you have implemented a stop and go protocol
on top of TCP. In such a protocol, an increase in network delay/latency
incurs a loss of bandwidth. Your A and B experiments uses an RTT of 0.25ms
and 10ms respectively. This is a 40 times increase in latency, and it will
affect the bandwidth between the peers. In this case it puts an artificial
limit on how much data you can transfer. You keep your messages at 1
kilobyte, so the req/s is essentially a bandwidth measurement of bytes/sec.
The problem is the same as when you are doing e.g., satellite
communications: geosynchronous orbit is around 550ms away in practice. It
is also linked to the so-called bandwidth*delay product (BDP).

Some napkin math: Suppose you have 1 connection. You have a lower RTT of
10ms. At most, this is 100 req/s on that connection. Suppose we have 500 of
those connections. Then the maximal req/s is 500*100 = 50,000.

A way I often approach these questions is by creating an extreme scenario:
I have 1 connection and 3 seconds of latency. What happens? The goal is to
identify the shadow constraints of the system so you can understand to
where the bottleneck moves once the apparent first bottleneck is found and
eliminated.

The other problem is that your load generator coordinates which leads to
coordinated omission[0]. The load generator only issues the next request
once the previous one completes. It is usually better to keep the bandwidth
usage contant and then measure latency, counting a late request against the
system.

The astute reader will also observe you measure the mean latency. This is
not very useful, and you should look at either a boxplot, kernel density
plot, histogram. or the like. If you know the data is normally distributed
with the same standard deviation, then your average latency makes sense as
a comparable value. But this requires you plot the data, look at it and
make sure they have that shape. Otherwise you can be led astray. As an
example suppose I have 2 fair dice. One die has faces 1,2,3,4,5,6. The
other has faces 1,1,1,6,6,6. These two dice have the same mean (3.5), but
you would not argue they are the same in distribution. In fact, the latter
die has no observation close to 3.5 ever!

Now to solutions:

The problem has to do with physics, more-so than Erlang. Information
travels as a wave in a medium such as copper wire or fiber. This speed has
an upper limit, which is the speed of light as information cannot travel
faster than that. In practice, fiber is roughly 2/3 of light speed, and you
can assume that relatively constant. You need to employ latency hiding
tricks to circumvent this limit.

Batching is an option. Collect multiple requests and send them all off at
the same time.This effectively makes sure you can have multiple requests
inflight, which gets you around the delay constant. It also allows a smart
server to process multiple requests simultaneously, thus shedding load.
Microbatching is alluring here: when the first request arrives, set a cork
for 5ms. Upon having read either 500 reqs or the timer triggering, process
whatever you have in the batch. Then start all over. This is used by e.g.,
Kafka and TensorFlow, the latter to avoid the memory bottleneck between
DRAM and GPU.

Pipelining is another option. Send multiple requests back to back, so they
are all in the network. Interlace your receive loop with new requests being
sent, so you can keep the work going. You should consider tagging messages
with unique identifiers so they can be told apart. This allows out-of-order
processing. See plan9's 9p protocol, RabbitMQ/AMQP's correlation IDs, or
HTTP/2. Quick implementation: Loic Hoguin's Cowboy/Gun combo works wonders
here, and uses the same code base (cowlib). This will avoid the wait time
effectively.

[0] See e.g., Gil Tene's work on this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190319/212203bd/attachment.htm>


More information about the erlang-questions mailing list