[erlang-questions] Fwd: Chameneos.rednux micro benchmark

Kenneth Lundin kenneth.lundin@REDACTED
Mon Oct 13 20:00:50 CEST 2008


Hi,

We have now looked a bit closer on why this benchmark does perform so
bad with Erlang
SMP.

There are 2 major things that stand out:

1) The locking of the single run-queue between the pthread based
schedulers is the main thing. There is an enormous amount of locking
and lock-conflicts here.

2) One single process (the broker) to which all others are sending
messages does also cause a lot of locking (the message-queue and heap
on the receiving process requires
a lock before the sending process can complete it's send operation)

Fortunately we have ongoing work with optimizations in these areas
which will  have major impact on this benchmark.

On Sun, Oct 12, 2008 at 6:25 PM, Kevin Scaldeferri
<kevin@REDACTED> wrote:
>
> On Oct 12, 2008, at 12:27 AM, Kenneth Lundin wrote:
>
>> Hi,
>>
>> It is not at all surprising that the SMP version run much slower than
>> the non SMP version.
>> I looked at the program source and what I find there is an
>> implementation that does not allow very much of
>> parallel execution.
>
> I agree with your analysis, but not with your conclusion.
>
> We can expect that the SMP version will not run much faster.  However, we
> might also hope that it does not run orders of magnitude slower.
>
> At the moment, pthread based solutions are slaughtering Erlang, and also
> achieving up to 350% CPU utilization.

A pthread based solution written in C for this particular problem is
not a fair comparison
since the Erlang VM is designed to support general execution of
thousands of processes
on top of a number of pthreads based schedulers. The VM  has to handle
all Erlang programs whatever execution pattern they have. There is an
overhead for being able to execute Erlang processes and IO in a
soft real time way. Normally this overhead is around 20% compared with
the non SMP VM.
With this particular benchmark each process perform so little
execution each time
it is run and the number of simultaneously runnable processes goes
like a Jojo between
first 1 to 4 and later 1-12. The effect of this is that it costs more
to schedule in a process
than it costs to run it's code.

Most probably the benchmark can be written in a more efficient way for
the Erlang VM as well. Maybe I will take the time to do that later.

/Kenneth Erlang/OTP team, Ericsson
>
> -kevin
>



More information about the erlang-questions mailing list