Multi-core Erlang

Thu Mar 9 21:39:37 CET 2006

Hello all,

I did too some testing with otp_src_P11B_2006-02-26. The platform was  
Power Mac with 2xG5 running Mac OS X Tiger.

Unfortunately the Mnesia bench (in lib/mnesia/examples/bench/ -directory)  
didn't show that good scaling with MT-EVM.  I started single node with +S  
2 -switch and run load generators and server on this node. That gave  
something between ~2800-2700 transactions/s. When running 2 single  
threaded nodes so that one runs load generators and the other is server  
node I got ~2000 tps. When both generators and server ran on single  
threaded node the result was ~3600 tps. That was somekind of a surprise.

I did some heavy modification on replica and fragment parameters (I think  
I tried to use 1 replica & 1 fragment), but I lost access to the Mac so I  
cannot repeat them here. I was hoping that somebody more knowledgeable  
with Mnesia (and has access to mt-able platform) could try to find out, if  
one could get better results from MT-EVM.

Running two naive fibonacci generators simultaneusly on node started with  
+S 2 the scalability was near perfect, of course. There were something  
interesting when quitting the EVT:

"jabba@REDACTED:~/programming/erlang/koulutus/erlang>  
~/tmp/otp_src_P11B_2006-02-26/bin/erl
Erlang (BEAM) emulator version 5.5 [source] [smp:1] [async-threads:0]  
[hipe] <timer-thread>

Eshell V5.5  (abort with ^G)
1> c(oma).
{ok,oma}
2> oma:fibonacci(40).
102334155
3> q().
ok
4> (no error logger present) error: "#Port<0.1>: io_thr_waker: Input  
driver gone away without deselecting!\n"
jabba@REDACTED:~/programming/erlang/koulutus/erlang>"

The fibonacci is the naive implementation if anybody wants to recreate  
above:
fibonacci(1) ->
         1;
fibonacci(2) ->
         1;
fibonacci(N) ->
         fibonacci(N - 1) + fibonacci(N - 2).

Cheers, Jani L.

Joe Armstrong (AL/EAB) <joe.armstrong@REDACTED> kirjoitti Thu, 09 Mar  
2006 16:22:33 +0200:

> Hello list,
>
>     Following Rickards post I have now got my hands on a dual core dual
> processor
> system - (ie 4 CPUs) and we have been able to reproduce Richards
> results.
>
>     I have posted a longer article (with images) on my newly started
> blog
>
>     http://www.erlang-stuff.net/blog/
>
>     This shows a 3.6 factor speedup for a message passing benchmark and
> 1.8 for an application program (a SIP stack) - these are good results.
> The
> second program in particular was not written to avoid sequential
> bottlenecks.
>
>      Despite this it ran 1.8 times faster on a 4 CPU system than on a
> one CPU system.
>
>      The nice thing about these results were that the benchmark ran
> almost 4 times faster
> - this benchmark just did spawns message passing and computations and
> had no sequential
> bottlenecks - pure code made from lots of small processes seems to speed
> up nicely on
> a multi-core system.
>
>      Well done Rickard
>
>      So how do you make stuff that goes fast? - go parallel
>
> Cheers
>
>
> /Joe
>
>
>
>
>> -----Original Message-----
>> From: owner-erlang-questions@REDACTED
>> [mailto:owner-erlang-questions@REDACTED] On Behalf Of Rickard Green
>> Sent: den 7 mars 2006 17:52
>> To: erlang-questions@REDACTED
>> Subject: [Fwd: Message passing benchmark on smp emulator]
>>
>> Trying again...
>>
>> -------- Original Message --------
>> Subject: Message passing benchmark on smp emulator
>> Date: Tue, 07 Mar 2006 17:30:40 +0100
>> From: Rickard Green <rickard.s.green@REDACTED>
>> Newsgroups: erix.mailing-list.erlang-questions
>>
>> The message passing benchmark used in estone (and bstone)
>> isn't very well suited for the smp emulator since it sends a
>> message in a ring (more or less only 1 process runnable all the time).
>>
>> In order to be able to take advantage of an smp emulator I
>> wrote another message passing benchmark. In this benchmark
>> all participating processes sends a message to all processes
>> and waits for replies on the sent messages.
>>
>> I've attached the benchmark. Run like this:
>> big:bang(NoOfParticipatingProcesses).
>>
>> I ran the benchmark on a machine with two hyperthreaded Xeon
>> 2.40GHz processors.
>>
>> big:bang(50):
>> * r10b completed after about 0.014 seconds.
>> * p11b with 4 schedulers completed after about 0.018 seconds.
>>
>> big:bang(100):
>> * r10b completed after about 0.088 seconds.
>> * p11b with 4 schedulers completed after about 0.088 seconds.
>>
>> big:bang(300):
>> * r10b completed after about 2.6 seconds.
>> * p11b with 4 schedulers completed after about 1.0 seconds.
>>
>> big:bang(500):
>> * r10b completed after about 10.7 seconds.
>> * p11b with 4 schedulers completed after about 3.5 seconds.
>>
>> big:bang(600):
>> * r10b completed after about 18.0 seconds.
>> * p11b with 4 schedulers completed after about 5.8 seconds.
>>
>> big:bang(700):
>> * r10b completed after about 27.0 seconds.
>> * p11b with 4 schedulers completed after about 9.3 seconds.
>>
>> Quite a good result I guess.
>>
>> Note that this is a special case and these kind of speedups
>> are not expected for an arbitrary Erlang program.
>>
>> If you want to try yourself download a P11B snapshot at:
>> http://www.erlang.org/download/snapshots/
>> remember to enable smp support:
>> ./configure --enable-smp-support --disable-lock-checking
>>
>> You can change the number of schedulers used by passing the
>> +S<NO_OF_SCHEDULERS> command line argument to erl or by calling:
>> erlang:system_flag(schedulers, NoOfSchedulers) ->
>> {ok|PosixError, CurrentNo, OldNo}
>>
>> /Rickard Green, Erlang/OTP
>>
>>
>>

-- 
Jani Launonen