[erlang-questions] Understanding the scalability of Erlang

Sun Sep 22 16:19:21 CEST 2013

# Melvyn Ferrao 2013-09-22:
> It is said that thousands of processes can be spawned to do the similar
> task concurrently and Erlang is good at handling it. If there is more work
> to be done, we can simply and safely add more worker processes and that
> makes it scalable.

The ability to spawn large number of very cheap processes and the fact
of eliminating resource sharing between them nearly completely are good
enablers for scalability (in addition to enabling other, parhaps more
important, things). That said however, nothing in the world will make
arbitrary application "scale" by magic. There is no excuse for proper
design and implementation towards the required operational characteristics
at application level.

Let me also note this is a thorougly end-to-end affair: scalably designed
application relying on "lame" protocol stack won't do wonders. So our unit
of analysis here is a whole node, or a bunch of nodes, not isolated pieces
here and there.

This goes against the widespread religions of "layering" and "abstraction
boundaries", which is why these are helpful tools but evil masters. (This
is something I wanted to say for a while now, made it fit in this context
slightly forcefully. :-)

> What I fail to understand is that if the work performed by each work is
> itself resource-intensive, how will Erlang be able to handle it? For
> instance, if entries are being made into a table by several sources and an
> Erlang application withing its hundreds of processes reads rows from the
> table and does something, this is obviously likely to cause resource
> burden. Every worker will try to pull a record from the table.

A hard fact of nature -- you've introduced a shared resource in your
architecture, now you have the possibility of it becoming a bottleneck.
Not an outright necessity though, measurements will tell.

> If this is a bad example, consider a worker that has to perform a highly
> CPU-intensive computation in memory. Thousands of such workers running
> concurrently will overwork the CPU.

Nothing "overworks" a CPU, it's not going to melt or something. :-) The more
of them you have the longer the run queues thus the higher execution latency
per worker. You're right in suggesting there are limits to per-node execution
capacity, that's something to consider and overload protection of various
kinds would typically be a "must have" item. These should probably be
configurable at runtime.

> Please rectify my understanding of the scalability in Erlang:
> Erlang processes get time slices of the CPU only if there is work available
> for them. OS processes on the other hand get time slices regardless of
> whether they are idle.

No. They both get a timeslice if they're "ready for execution", that is "not
blocking". Erlang process blocks if it sits in a receive clause. OS process
blocks if it sits in a blocking syscall.

> The startup and shutdown time of Erlang processes is much lower than that
> of OS processes.

Yes, and Erlang processes are cheaper/smaller than OS processes. There's also
an important synergy between the programming language and underlying runtime
system, something you don't have in the native case.

> Apart from the above two points is there something about Erlang that makes
> it scalable?

I've simplified a lot but hopefully the above helps. Let's also note that
there are problem domains where it's important to cheaply process large
shared data structures in parallel. Those won't necessarily play well with
Erlang.

BR,
	-- Jachym