[erlang-questions] multicore performance fine grained concurrency

Raimo Niskanen raimo+erlang-questions@REDACTED
Fri May 7 09:27:48 CEST 2010


On Thu, May 06, 2010 at 01:50:23PM -0400, David N Murray wrote:
> On May 6, Johan Montelius scribed:
> 
> >
> >
> >
> > smp 4:4 -> 126 ms
> > smp 2:2 -> 143 ms
> > smp disabled -> 65 ms
> >
> > :-(
> >
> 
> I saw something similar using the Ring benchmark on both AMD (OpenBSD) and
> Intel (Vista) dual cores.  Both cores get utilized in the 40-50% range
> with smp enabled.  It takes 1/4 the time to run the benchmark with smp
> disabled as it does when smp is enabled. Takes advantage of two cores just
> fine if you run two OS processes with SMP disabled.  Doesn't do so well
> SMP enabled.  The ring benchmark just spawns and sends messages.

There are many different reasons why SMP, especially SMP benchmarks
(as Kenneth explained in another mail in this thread) performs poorly.

OpenBSD still does not have native threads, so one OS process only runs on
one CPU at the time. Threads are implemented as old style (green)
threads within that process. The SMP emulator starts, probably with
as many schedulers as there are CPUs, runs on both schedulers
within one CPU thread, and the OS distributes that load over
both CPUs. So max possible utilization will be 50% per CPU.

Vista seems to be very eager to distribute the load over the CPUs,
so execution jumps between them like crazy, which destroys the
CPU memory cache for every jump, slowing down execution.

Intel before i7/i5 has much less memory bandwidth, especially
between cores, so the SMP emulator performs worse on them than
on i7/i5.

Our current best combo I guess is Linux (perhaps Solaris maybe
FreeBSD) on Intel I7.

And for the ring benchmark (single ring) is it not so that it sends
a message in a ring so there is only one process at every instant that
can execute.  So all the SMP emulator can contribute is overhead, making
this benchmark only measure how much overhead the SMP emulator has
for pure message passing and process scheduling. The SMP emulator
can never beat the non-SMP emulator for a single ring and it can
only load one CPU no more than 100%.

> 
> smp 2:2 -> 18003 ms
> smp disabled -> 4867 ms
> 2 os processes -> ~5900 ms
> 
> hth,
> Dave
> 
> 
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


More information about the erlang-questions mailing list