[erlang-questions] SMP performance with hackbench

Ulf Wiger ulf.wiger@REDACTED
Wed Aug 19 09:59:27 CEST 2009


One thing you could try is to eliminate the shared binary
and send a simple message instead, e.g.

-define(DATA, 1).

I don't know if it will make a big difference. Ideally,
passing a shared binary will be as efficient, but this
is at least a logical exclusion step.

BR,
Ulf W

Jiang Wei wrote:
> The test machine topology is [(0,1,4,5), (2,3,6,7)], and 
> erlang:system_info(cpu_topology) outputs:
> 
>     1> erlang:system_info(cpu_topology).
>     [{processor,[{core,{logical,0}},
>                  {core,{logical,4}},
>                  {core,{logical,1}},
>                  {core,{logical,5}}]},
>      {processor,[{core,{logical,2}},
>                  {core,{logical,6}},
>                  {core,{logical,3}},
>                  {core,{logical,7}}]}]
> 
> So it's right.
> Then I bind schedulers to cpu cores:
> 
>      2> erlang:system_flag(scheduler_bind_type,default_bind).
>     unbound
>      3> erlang:system_info(scheduler_bindings).
>     {0,2,4,6,1,3,5,7}
> 
> Re-run the hackbench:
> 
>     4> c(hackbench).
>     ./hackbench.erl:56: Warning: variable 'Msg' is unused
>     {ok,hackbench}
>     5> hackbench:main(300,1000).   
>     71.174117
>     // 300 groups, each groups has 20 pairs of processes, total
>     300*(20*2)=12000 processes, msg is sent 1000 times
>     6> hackbench:main(300,1000).
>     75.165799
> 
> without binding and everything is in default:
> 
>     3> hackbench:main(300,1000).
>     67.151053
>     4> hackbench:main(300,1000).
>     72.056573
> 
> It doesn't change much.
>  
> With smp disable:
> 
>     2> hackbench:main(300,1000).
>     53.942253
> 
> *More info is in the attachment. (including uname -a, /etc/issue, 
> /proc/cpuinfo, erlang version, gcc version)
>   
> 2009/8/19 Zoltan Lajos Kis <kiszl@REDACTED <mailto:kiszl@REDACTED>>
>  
> 
>     Hi,
>      
>     First check if the cpu topology is properly identified:
>     erlang:system_info(cpu_topology). If not, set it manually:
>     erlang:system_flag(cpu_topology, Topo). (see slide* 27 for Topo).
>     Then bind the schedulers to cpu cores:
>     erlang:system_flag(scheduler_bind_type,default_bind). Check that the
>     binding succeeded: erlang:system_info(scheduler_bindings).
>     Try the SMP test again with these settings, and please tell us the
>     new results.
>      
>     *see slides 22-28 in Kenneth's talk on multicore:
>     http://www.erlang-factory.com/upload/presentations/105/KennethLundin-ErlangFactory2009London-AboutErlangOTPandMulti-coreperformanceinparticular.pdf
>      
>     Regards,
>     Zoltan.
>      
>     Jiang Wei wrote:
>      
> 
>         Hi, list
>             I write hackbench in erlang to test the performance, which
>         is originally a benchmark for linux scheduler.
>             (Hackbench contains several groups; each groups contains 20
>         pairs of senders and receivers; each sender needs to send some
>         messages to the 20 receivers in the same group. The performance
>         is measured by the time taken, less is better.)
>               The tests are carried out on an Intel server with 2
>         quad-core processors and 4G memory.
>             I am surprised with results I got:
>             (1) SMP enable +S 8
>             root@REDACTED:~/hackbench# \time ./run_one_erl.sh
>             Time is 62.260995
>             295.67user 110.62system 1:14.27elapsed 546%CPU
>         (0avgtext+0avgdata 0maxresident)k
>             11776inputs+8outputs (27major+90965minor)pagefaults 0swaps
>                The performance is 62 sec and the oprofile shows 28% cpu
>         time is using in pthread_mutex_*.
>                (2) SMP disable
>          
>              root@REDACTED:~/hackbench <mailto:root@REDACTED
>         <mailto:root@REDACTED>:%7E/hackbench># \time ./run_one_erl.sh "-smp
>         disable"
>              Time is 54.14644
>              54.23user 0.33system 1:05.66elapsed 83%CPU
>         (0avgtext+0avgdata 0maxresident)k
>              3968inputs+8outputs (22major+36520minor)pagefaults 0swaps
>                The performance is 54 sec and using only 83% cpu.
>                So it seems the erlang has problems with using all the
>         smp resources for serious lock contention in smp scheduler. Am I
>         right?
>              And because I am new to erlang, the hackbench.erl may be in
>         bad encoding, which will harm the performance. Can anyone help
>         me review my code?
>                I attach both the original C version of hackbench and my
>         erlang version one.
>                Thanks a lot!
>              (I am sorry If it is the wrong place to post this letter.)  
>         -- 
>         Best Regards,
>         Jiang, Wei
>          
>         ------------------------------------------------------------------------
> 
>          
>          
>         ________________________________________________________________
>         erlang-questions mailing list. See http://www.erlang.org/faq.html
>         erlang-questions (at) erlang.org <http://erlang.org/>
>          
> 
>      
> 
>  
>  
> -- 
> Best Regards,
> Jiang, Wei
>  
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org


-- 
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com


More information about the erlang-questions mailing list