[erlang-questions] Fwd: Chameneos.rednux micro benchmark

Kevin Scaldeferri kevin@REDACTED
Sat Oct 11 18:59:27 CEST 2008

[sigh... hit Reply instead of Reply All]

Begin forwarded message:

> From: Kevin Scaldeferri <kevin@REDACTED>
> Date: October 10, 2008 6:31:27 PM PDT
> To: "Edwin Fine" <erlang-questions_efine@REDACTED>
> Subject: Re: [erlang-questions] Chameneos.rednux micro benchmark
> On Oct 10, 2008, at 4:19 PM, Edwin Fine wrote:
>> It was run with HiPE. It's mentioned on the page that has the  
>> Erlang code.
> I was actually asking about when you ran it.  I did know that the  
> benchmark site uses HiPE.
>> A run of vmstat showed a minimum of about 15,000 context switches  
>> per second and often more. Without the program running, there were  
>> only about 500 or so per second.
>> ...
>> I ran again with vmstat and  -smp disabled. vmstat showed no  
>> noticable difference in the cs column when the program was running  
>> compared to when it was not:
>> ...
>> Tentative conclusion: this benchmark makes SMP Erlang do an  
>> excessive number of context switches. Is that because it is jumping  
>> between cores, or because of inter-process communication between  
>> cores? I can't answer that fully, but I can see what happens if we  
>> retrict it to one core using one VM.
> This is not all that surprising.  Consider the part of the benchmark  
> where there are 3 chameneos participating.  Each of them, and the  
> parent, will likely end up on their own scheduler (on quad-core).   
> They all send a message then go to sleep.  The parent receives the  
> messages, processes some, goes to sleep.  Children wake up, get  
> messages, send message, go to sleep.  Repeat.  You can see that for  
> much of the time, many of the schedules have nothing to do, and  
> their threads may be switched out.
> Without SMP, all the Erlang processes run in the same scheduler  
> thread, and there is always work to be done, so no or few context  
> switches.
> Of course, for the portion with 10 chameneos, there is more often  
> work that can be done, but maybe still not enough to saturate all  
> the cores all the time.
>> So if it's not CPU-bound (60% idle), and it's not memory capacity  
>> bound (virtual memory usage only 78MB), and it's not disk or  
>> network I/O bound, what is it?
> a) as explained above, there are synchronization requirements as  
> part of the game that may make it difficult to saturate all the CPUs
> b) I also speculated that migrating processes from one thread (core)  
> to another may be significant.  I'm not really sure where to look in  
> the OS stats to find evidence to support this.  (I guess you'd want  
> to see if the memory bus is saturated.)
> I should also point out that it seems like there is either a  
> significant different between Erlang running on 2 and 4 cores, or  
> between the chip architectures themselves.  Running parallel  
> versions of other benchmarks on my 2-core hardware, I usually find  
> that the total CPU time used is only slightly higher than a single- 
> process version.  However, on the Alioth 4-core hardware, the total  
> CPU usage is about double.  (Look at the two Erlang version for  
> binary-trees and mandelbrot).  I am inclined to think Erlang is to  
> blame, if only because the Haskell entries don't show the same  
> behavior.
> -kevin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20081011/c24e24be/attachment.htm>

More information about the erlang-questions mailing list