[erlang-questions] SMP performance with hackbench

Wed Aug 19 10:17:34 CEST 2009

I try hackbench with different processes number, the thread number will be
calculated as (GroupNumber * 40), that is, 20 senders and 20 receivers in
one group.
 Group    smp        smp
Number   8:8        disable
1      0.605369       0.093884
10    4.951785       1.414963
50  17.635626       8.722577
100 25.744995     17.889164
200 56.963373      36.106899

The smp disbable is always better than the smp:8:8 case.
Oprofiles shows almost 30% of cpu time is used by pthread_mutex_*.
The migration logic may be responsible for it. I am looking into the erlang
scheduler code and hope I can find the reason there.

2009/8/19 Sean Cribbs <seancribbs@REDACTED>

> In a dual quad-core setup, consider that there will be different
> message-passing speeds between:
>
> 1) cores on the same chip
> 2) cores on different chips
> 3) in some cases, cores in different combinations on the same chip (e.g.
> Nehalem quad-core processors have paired cores with some shared cache)
>
> If you're crossing boundaries between chips/cores frequently, you have to
> go through a cache or main RAM, which will be slower than running all on the
> same core.  Try increasing the number of processes dramatically in your test
> and see how the SMP vs. non-SMP scenario pans out. 40 processes could be
> considered a small number of processes for an Erlang application.
>
> Sean Cribbs
>
> Jiang Wei wrote:
>
>> Hi, list
>>     I write hackbench in erlang to test the performance, which is
>> originally a benchmark for linux scheduler.
>>     (Hackbench contains several groups; each groups contains 20 pairs of
>> senders and receivers; each sender needs to send some messages to the 20
>> receivers in the same group. The performance is measured by the time taken,
>> less is better.)
>>       The tests are carried out on an Intel server with 2 quad-core
>> processors and 4G memory.
>>     I am surprised with results I got:
>>     (1) SMP enable +S 8
>>     root@REDACTED:~/hackbench# \time ./run_one_erl.sh
>>     Time is 62.260995
>>     295.67user 110.62system 1:14.27elapsed 546%CPU (0avgtext+0avgdata
>> 0maxresident)k
>>     11776inputs+8outputs (27major+90965minor)pagefaults 0swaps
>>        The performance is 62 sec and the oprofile shows 28% cpu time is
>> using in pthread_mutex_*.
>>        (2) SMP disable
>>      root@REDACTED:~/hackbench <mailto:root@REDACTED:%7E/hackbench># \time
>> ./run_one_erl.sh "-smp disable"
>>      Time is 54.14644
>>      54.23user 0.33system 1:05.66elapsed 83%CPU (0avgtext+0avgdata
>> 0maxresident)k
>>      3968inputs+8outputs (22major+36520minor)pagefaults 0swaps
>>        The performance is 54 sec and using only 83% cpu.
>>        So it seems the erlang has problems with using all the smp
>> resources for serious lock contention in smp scheduler. Am I right?
>>      And because I am new to erlang, the hackbench.erl may be in bad
>> encoding, which will harm the performance. Can anyone help me review my
>> code?
>>        I attach both the original C version of hackbench and my erlang
>> version one.
>>        Thanks a lot!
>>      (I am sorry If it is the wrong place to post this letter.)
>> --
>> Best Regards,
>> Jiang, Wei
>> ------------------------------------------------------------------------
>>
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>
>

-- 
Best Regards,
Jiang, Wei