[erlang-questions] Is Erlang ideal for a global exchange?

Tue Jan 20 14:28:05 CET 2015

On Tue, Jan 20, 2015 at 4:54 AM, Mihai Balea <mihai@REDACTED> wrote:

> I would tend to disagree with the first part. Any garbage collected
> language will not offer predictable latency - in a real time sense.

This might have been true back in the day, but it surely is not true
anymore. First of all, you have to make the distinction between hard and
soft realtime operation. The rule of hard realtime is that if you miss a
deadline, the program is faulty. In soft realtime it is okay, as long as
the average (or perhaps median) hits the deadline. In other words, soft
realtime systems are allowed to fail occasionally, and Erlang has always
been a soft realtime system.

The first important point is that realtime garbage collectors do exist.
They are insanely complex, but they do exist and can do very well. The
other important point is that for some workloads, Erlang soundly beats
non-GC'ed languages in the latency game, which is food for thought if you
are of the opinion that GC'ed languages all have problematic latency. The
Techempower benchmark of webservers,

https://www.techempower.com/benchmarks/

shows a latency where the deviation of "cowboy", an Erlang web server, is
on par with Ur/Web, and the maximal latency is *far* better than its
competition. And do mind that the competition is sometimes written in C++.
Also, note that I specifically wrote "predictable latency" over "low
latency", which is a different beast. The goal here is that if the latency
is 1ms, then the latency is probably going to be 1ms plus/minus a small
variation. The C++ web servers can handle a request in perhaps 0.1ms, but
then their variation is large and some requests might take 5, 10, or 134ms.
Chances are that the problems you will face using Erlang are much smaller
compared to a C++ solution, when all is being measured up. Of course, the
C++ solution processes at a much higher volume, but doing so, it has worse
predictable latency.

Then again, there are parts of the Erlang/OTP solution which makes it
somewhat of a GC/non-GC hybrid. ETS tables are not garbage collected. A
terminating process doesn't need garbage collection, and you can sometimes
arrange it such that the process has enough initial heap to never collect.
Processes with small memory footprint have very predictable pause times
since they are bounded by two-space copying traversal time. If you know
what you are doing, it is entirely possible to avoid long pause times by a
little attention to how you program the system.

I suspect the reason boils down to head-of-line blocking due to cooperative
scheduling. In most C++ solutions, what you are aiming for is to process
each work unit as fast as possible, notwithstanding what code path you
took. If there is a mistake in one path which imposes latency on the
system, then your latency suffers on a global scale. Not so in Erlang (and
Go 1.3.x+), where the process would simply be scheduled away, so only the
slow code path suffers.

Then, there is the question of high volume. Systems tend to operate
differently under massive volume and load. There, a small mistake in a data
structure is what is going to cost you the desired latency. Or a
pathological situation, where a hash table ends up with too many conflicts.
The trade-off is to use a more robust data structure, but this comes at a
performance price. You got the predictable latency, but lost the very fast
operation.

The perhaps most glaring omission is to ignore quantities of "volume" and
"low latency". To some people, low latency means FPGA implementations,
because C is too slow. To some people, a webserver taking 20 reqs/s over
the day on average is a loaded webserver. There are limits to all system
designs, where language choice is but one.

As for the 16 core "limit", it is a myth on a modern OTP 17.x. Of course,
such measurements are dependent on the particular benchmark, and if you hit
lock contention in the OTP subsystem. But most of the old limits have been
lifted systematically and you are looking at at least 64 cores now, and
perhaps closer to 128. That said, it depends. Currently, there is a
bottleneck around the timer wheel, something which is being addressed for
release 18.x. It is a contended lock which WhatsApp also worked around in
their solutions. Perhaps somewhat enlightening, the single core performance
can sometimes suffer from efficient multi-core support. Erlang currently
leans such that fast multi-core performance is more important than the
single executing thread.

I think my main point still stands. In Erlang, the problem could be to get
a desired low latency. But this is easy to test for early on in
development, and eventually implement parts of the code as a NIF to get the
desired speed. I don't see volume as a problem at all. In a solution
written in "fast" languages like Java, C#, C or C++, the problems are far
more subtle and will begin showing themselves long into the process of
writing the software. And it will then be hard to change the solution
because the investment is now sunk cost. Incidentally, this is why
languages with fast prototyping properties are so powerful (OCaml
specifically comes to mind here).

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150120/9a2afacc/attachment.htm>