[erlang-questions] Fwd: Chameneos.rednux micro benchmark

Kenneth Lundin kenneth.lundin@REDACTED
Sun Oct 12 09:27:26 CEST 2008


Hi,

It is not at all surprising that the SMP version run much slower than
the non SMP version.
I looked at the program source and what I find there is an
implementation that does not allow very much of
parallell execution.
The broker process is clearly a bottleneck since it is involved in
everything. Every other process must wait for the broker process
before it can continue it's execution.
The other processes are also doing so little useful work so the
task-switching and locking around the run-queue will become the
dominating thing.
When you have benchmarks with parallell processes that hardly perform
any work and all processes are highly dependent on
other processes, you can expect results like this, there is nothing
wrong or bad with the ERlang SMP implementation because of that.
It is the same with a SW project, do you think a 100 line program will
be finished faster if you have 100 programmers instead of one?

/Kenneth Erlang/OTP team, Ericsson


2008/10/11 Kevin Scaldeferri <kevin@REDACTED>:
> [sigh... hit Reply instead of Reply All]
>
> Begin forwarded message:
>
> From: Kevin Scaldeferri <kevin@REDACTED>
> Date: October 10, 2008 6:31:27 PM PDT
> To: "Edwin Fine" <erlang-questions_efine@REDACTED>
> Subject: Re: [erlang-questions] Chameneos.rednux micro benchmark
>
> On Oct 10, 2008, at 4:19 PM, Edwin Fine wrote:
>
> It was run with HiPE. It's mentioned on the page that has the Erlang code.
>
> I was actually asking about when you ran it.  I did know that the benchmark
> site uses HiPE.
>
> A run of vmstat showed a minimum of about 15,000 context switches per second
> and often more. Without the program running, there were only about 500 or so
> per second.
> ...
>
> I ran again with vmstat and  -smp disabled. vmstat showed no noticable
> difference in the cs column when the program was running compared to when it
> was not:
>
> ...
>
> Tentative conclusion: this benchmark makes SMP Erlang do an excessive number
> of context switches. Is that because it is jumping between cores, or because
> of inter-process communication between cores? I can't answer that fully, but
> I can see what happens if we retrict it to one core using one VM.
>
> This is not all that surprising.  Consider the part of the benchmark where
> there are 3 chameneos participating.  Each of them, and the parent, will
> likely end up on their own scheduler (on quad-core).  They all send a
> message then go to sleep.  The parent receives the messages, processes some,
> goes to sleep.  Children wake up, get messages, send message, go to sleep.
>  Repeat.  You can see that for much of the time, many of the schedules have
> nothing to do, and their threads may be switched out.
> Without SMP, all the Erlang processes run in the same scheduler thread, and
> there is always work to be done, so no or few context switches.
> Of course, for the portion with 10 chameneos, there is more often work that
> can be done, but maybe still not enough to saturate all the cores all the
> time.
>
>
> So if it's not CPU-bound (60% idle), and it's not memory capacity bound
> (virtual memory usage only 78MB), and it's not disk or network I/O bound,
> what is it?
>
> a) as explained above, there are synchronization requirements as part of the
> game that may make it difficult to saturate all the CPUs
> b) I also speculated that migrating processes from one thread (core) to
> another may be significant.  I'm not really sure where to look in the OS
> stats to find evidence to support this.  (I guess you'd want to see if the
> memory bus is saturated.)
>
>
> I should also point out that it seems like there is either a significant
> different between Erlang running on 2 and 4 cores, or between the chip
> architectures themselves.  Running parallel versions of other benchmarks on
> my 2-core hardware, I usually find that the total CPU time used is only
> slightly higher than a single-process version.  However, on the Alioth
> 4-core hardware, the total CPU usage is about double.  (Look at the two
> Erlang version for binary-trees and mandelbrot).  I am inclined to think
> Erlang is to blame, if only because the Haskell entries don't show the same
> behavior.
>
>
> -kevin
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list