Advantages of a large number of threads cf other approaches?
Joe Armstrong
joe@REDACTED
Mon Feb 16 19:32:24 CET 2004
On Mon, 16 Feb 2004 jonathan@REDACTED wrote:
> I was discussing Erlang with a colleague and little difficulty
> convincing him either that its CSP approach was superior to trying to
> manage a large number of threads using semaphores, or that Erlangs VM
> threads scale better than antive threads on the OS's we're familiar
> with. What I had a harder time doing was convincing him that using a
> large number of threads was a best solution for any of the major
> cases we discussed:
>
> * Internet servers - why not use asynchronous sockets running in a
> single thread?
>
Because of errors.
Think of the process as not only providing units of concurrency but
also of providing error encapsulation boundaries.
Erlang was designed for programming fault-tolerant systems - to do
so we must make sure that faulty code running somewhere in the system
does not crash good code - the process provides the error
encapsulation boundaries. This is the single most important property
that a process has.
Pseudo concurrency can be achieved in a single sequential process by
writing your own scheduler and appropriately interleaving the
computations (this is not a practice to be recommended, unless you are
writing the Erlang run-time kernel) - the point is that it *can* be
done.
But containing the errors *cannot* be done within a programming
language - to do this you need external control over your resources.
In an OS you contain some errors to processes using hardware that
prevents one process from writing to the memory used by another
process. In Erlang each process has its own virtual memory space - so
processes cannot overwrite each other's data structures.
Erlang is great for (say) web servers because we can implement (say)
the equivalent of CGI scripts as Erlang processes. In a regular web
server environment a C program would never dare to run a CGI script in
the same memory space as the web server - or even dare to run an
arbitrary C program in the same memory space as the server because an
error in the script could crash the entire server. So web servers
always spawn off OS processes to evaluate scripts in - and this is
*very* inefficient. In Erlang there is no such problem - processes are
very light-weight and if they crash no harm is done.
Interestingly Erlang and Apache perform equally when both are
unloaded - this is hardly surprising since the heavy stuff in the
Erlang I/O routines is all written in C and the programs are "BIF
bound" - but under conditions of massive overload the story is very
different.
To see how things shape up under massive overload see:
http://www.sics.se/~joe/apachevsyaws.html
In this experiment Apache crashed when subject to a load of about
4000 parallel sessions - the Erlang web server (yaws) was happily
ticking along at 80,000 parallel sessions.
> * Simulations - why use an object per thread rather than a "classic"
> OO approach?
This depends upon the simulation - if you are modeling the real
world then mapping each concurrent object in the real world with
exactly one concurrent processes bridges between the gap between the
model and the simulation code in a very natural way - the code will
almost "write itself".
The real world - is concurrent - there a loads of things going along
in parallel in the real world - and yet most of our programming
languages, which we use to model and interact with the real world are
sequential - this makes the programs unnecessary complex and difficult
to understand.
/Joe
>
> One answer is that CSP threads (at least in the first case) lead to a
> simpler design and that Erlang itself has a host of features that
> support highly reliable programs. But I'd appreciate suggestions
> regarding other benefits, especially those arising from CSP itself.
>
> - Jonathan
>
More information about the erlang-questions
mailing list