[erlang-questions] SMP performance with hackbench
Jiang Wei
jwhust@REDACTED
Wed Aug 19 09:50:11 CEST 2009
The test machine topology is [(0,1,4,5), (2,3,6,7)], and
erlang:system_info(cpu_topology) outputs:
1> erlang:system_info(cpu_topology).
[{processor,[{core,{logical,0}},
{core,{logical,4}},
{core,{logical,1}},
{core,{logical,5}}]},
{processor,[{core,{logical,2}},
{core,{logical,6}},
{core,{logical,3}},
{core,{logical,7}}]}]
So it's right.
Then I bind schedulers to cpu cores:
2> erlang:system_flag(scheduler_bind_type,default_bind).
unbound
3> erlang:system_info(scheduler_bindings). {0,2,4,6,1,3,5,7}
Re-run the hackbench:
4> c(hackbench).
./hackbench.erl:56: Warning: variable 'Msg' is unused
{ok,hackbench}
5> hackbench:main(300,1000).
71.174117
// 300 groups, each groups has 20 pairs of processes, total 300*(20*2)=12000
processes, msg is sent 1000 times
6> hackbench:main(300,1000). 75.165799
without binding and everything is in default:
3> hackbench:main(300,1000).
67.151053
4> hackbench:main(300,1000).
72.056573
It doesn't change much.
With smp disable:
2> hackbench:main(300,1000).
53.942253
*More info is in the attachment. (including uname -a, /etc/issue,
/proc/cpuinfo, erlang version, gcc version)
2009/8/19 Zoltan Lajos Kis <kiszl@REDACTED>
> Hi,
> First check if the cpu topology is properly identified:
> erlang:system_info(cpu_topology). If not, set it manually:
> erlang:system_flag(cpu_topology, Topo). (see slide* 27 for Topo).
> Then bind the schedulers to cpu cores:
> erlang:system_flag(scheduler_bind_type,default_bind). Check that the binding
> succeeded: erlang:system_info(scheduler_bindings).
> Try the SMP test again with these settings, and please tell us the new
> results.
>
> *see slides 22-28 in Kenneth's talk on multicore:
>
> http://www.erlang-factory.com/upload/presentations/105/KennethLundin-ErlangFactory2009London-AboutErlangOTPandMulti-coreperformanceinparticular.pdf
>
> Regards,
> Zoltan.
>
> Jiang Wei wrote:
>
>
>> Hi, list I write hackbench in erlang to test the performance, which
>> is originally a benchmark for linux scheduler.
>> (Hackbench contains several groups; each groups contains 20 pairs of
>> senders and receivers; each sender needs to send some messages to the 20
>> receivers in the same group. The performance is measured by the time taken,
>> less is better.)
>> The tests are carried out on an Intel server with 2 quad-core
>> processors and 4G memory.
>> I am surprised with results I got:
>> (1) SMP enable +S 8
>> root@REDACTED:~/hackbench# \time ./run_one_erl.sh
>> Time is 62.260995
>> 295.67user 110.62system 1:14.27elapsed 546%CPU (0avgtext+0avgdata
>> 0maxresident)k
>> 11776inputs+8outputs (27major+90965minor)pagefaults 0swaps
>> The performance is 62 sec and the oprofile shows 28% cpu time is
>> using in pthread_mutex_*.
>> (2) SMP disable
>>
>> root@REDACTED:~/hackbench <mailto:root@REDACTED:%7E/hackbench># \time
>> ./run_one_erl.sh "-smp disable" Time is 54.14644
>> 54.23user 0.33system 1:05.66elapsed 83%CPU (0avgtext+0avgdata
>> 0maxresident)k
>> 3968inputs+8outputs (22major+36520minor)pagefaults 0swaps
>> The performance is 54 sec and using only 83% cpu.
>> So it seems the erlang has problems with using all the smp
>> resources for serious lock contention in smp scheduler. Am I right?
>> And because I am new to erlang, the hackbench.erl may be in bad
>> encoding, which will harm the performance. Can anyone help me review my
>> code?
>> I attach both the original C version of hackbench and my erlang
>> version one.
>> Thanks a lot!
>> (I am sorry If it is the wrong place to post this letter.)
>> --
>> Best Regards,
>> Jiang, Wei
>>
>> ------------------------------------------------------------------------
>>
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>>
>
>
--
Best Regards,
Jiang, Wei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090819/93831762/attachment.htm>
-------------- next part --------------
root@REDACTED:~# uname -a
Linux test 2.6.30.4 #1 SMP Tue Aug 18 17:14:45 CST 2009 x86_64 GNU/Linux
root@REDACTED:~# cat /etc/issue
Ubuntu 8.04.2 \n \l
root@REDACTED:~# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz
stepping : 11
cpu MHz : 1596.027
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority
bogomips : 3192.05
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz
stepping : 11
cpu MHz : 1596.027
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority
bogomips : 3191.90
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz
stepping : 11
cpu MHz : 1596.027
cache size : 4096 KB
physical id : 1
siblings : 4
core id : 0
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority
bogomips : 3191.92
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz
stepping : 11
cpu MHz : 1596.027
cache size : 4096 KB
physical id : 1
siblings : 4
core id : 2
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority
bogomips : 3191.93
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz
stepping : 11
cpu MHz : 1596.027
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority
bogomips : 3191.91
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz
stepping : 11
cpu MHz : 1596.027
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority
bogomips : 3191.90
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz
stepping : 11
cpu MHz : 1596.027
cache size : 4096 KB
physical id : 1
siblings : 4
core id : 1
cpu cores : 4
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority
bogomips : 3191.91
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz
stepping : 11
cpu MHz : 1596.027
cache size : 4096 KB
physical id : 1
siblings : 4
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority
bogomips : 3191.93
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
root@REDACTED:~# erl -V
Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.7.2 (abort with ^G)
root@REDACTED:~/hackbench# gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
More information about the erlang-questions
mailing list