[erlang-questions] What makes erlang scalable?
Jesper Louis Andersen
Sun May 10 20:03:35 CEST 2015
On Sat, May 9, 2015 at 3:40 PM, Yash Ganthe <> wrote:
I think all of these are good questions. Let me try to answer them one by
am working on an article describing fundamentals of technologies used by
> scalable systems.
First, you have to come up with a definition of what "scalable" means in
this setting. Usually when you add load to a system, its system behavior
changes. Usually "scalable" is meant to be understood such that there is no
noticeable change as we add load, or that the change in behavior is
graceful degradation. That is, if the system has a normal operating
capacity of 20 simultaneous users, and we add load to 100 simultaneous
users, we expect processing to either stay the same, or become roughly 5
times slower for each user. But we don't expect the system to fail for the
users, or take 1000 times as much processing time all of a sudden. Making
sure you know exactly what measurable quantity you are concerned about when
scaling the system up is important.
> 1. What is in the implementation of Erlang that makes it scalable? What
> makes it able to run concurrent processes more efficiently than
> technologies like Java?
Each concurrent process will take up resources. The more resources a
process takes, the less of them can be had. The Erlang BEAM VM is pretty
efficient and can have millions of processes. A naive Java-solution based
on threads will quickly come into trouble, running out of resources.
Furthermore, thread overhead is typically larger than process overhead in
the BEAM VM.
You can, however, implement the central aspects of the Erlang VM on top of
Java's JVM which gives you some of the same capabilities. See for instance,
the Erjang project. This yields many of the same advantages. The idea is
roughly the same: keep the processes in user-space and do not involve the
kernel at all. Then map a number of process schedulers onto threads toward
the kernel in an N:M threading architecture.
> 2. What is the relation between functional programming and
> parallelization? With the declarative syntax of Erlang, do we achieve
> run-time efficiency?
To execute part of a program in parallel, you need to have at least two
independent tasks that can be computed without depending directly on each
other. This would allow us to spread the work over two cores. If we have
8-way parallel sections, we can utilize 8 cores in principle and so on.
If you have a functional program without any side-effects, it tends to
expose many places where such independent tasks can be identified. In
principle, this means you have an easy way to turn a sequential program
into a parallel one. And this is the lure of the functional programming
paradigm w.r.t. parallel execution. It isn't limited to functional
programming only however. Logic programming (Prolog, Mercury) is also
highly amenable to parallel execution. Any model which looks like the Actor
model is also amenable because each actor can execute independently of each
Interestingly, Erlang doesn't employ the "functional programming" parallel
methodology, but rather the one based on an actor-like model. There is no
part of the Erlang runtime that will try to run a list comprehension such
as [F(X) || X <- List] in parallel currently, though in many cases this
could be done.
> 3. Does process state not make it heavy? If we have thousands of
> concurrent users and spawn and equal number of processes as gen_server or
> any other equivalent pattern, each process would maintain a state. With so
> many processes, will it not be a drain on the RAM?
Erlang processes grows dynamically from around 1-1.5 Kilobytes of memory.
In practice this has proven to be quite efficient, even for a large number
of processes. it corresponds to around 1.5 Gigabyte of memory at one
million processes currently. For a server environment, this is hardly a
problematic drain. For a small embedded device, this can be a problem, if
you want to run millions of processes on it. But even in that space, RAM is
getting cheaper every day.
> 4. If a process has to make DB operations and we spawn multiple instances
> of that process, eventually the DB will become a bottleneck. This happens
> even if we use traditional models like Apache-PHP. Almost every business
> application needs DB access. What then do we gain from using Erlang?
The key difference is that we can keep part of the computation in memory
and only push it back to the DB whenever we want. In a classical PHP
application, state is lost between each invocation. Suppose we are
implementing a server keeping track of Chess games. In PHP we would have to
store the chess game state in the DB for each move made. In Erlang, we can
keep that state in-memory in a process. Every time 5 minutes pass, we send
a message to the DB asynchronously telling it what moves were made in the
meantime. This lowers the load on the DB considerably. In fact, we can
control the load on the DB by dynamically optimizing the interval at which
we store data back to the database.
The PHP solution would be able to track the in-application state in Redis
or Memcached. But this means we would still have to retrieve the state from
another unix-process whenever we wanted to do a move. This is usually a
waste of resources.
5. How does process restart help? A process crashes when something is wrong
> in its logic or in the data. OTP allows you to restart a process. If the
> logic or data does not change, why would the process not crash again and
> keep crashing always?
The gist of this has to do with non-deterministic errors. Many errors in
distributed/asynchronous systems are "spurious" in that they only occur if
certain events happens in certain orders. If the bad event-order is highly
unlikely, retrying the operation often suceeds and lets the system continue
on. Other times, a user will do things in a different order the next time,
and this order avoids the problem. In the network, TCP window stalls might
make data arrive in an opposite order than what was normally intended.
On the other hand, as you say, if there is a genuine programmer error and
we always hit the same code-path, then there is nothing fault tolerance can
do. Restarting the system has no effect and won't correct the error.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions