<div dir="ltr">It was run with HiPE. It's mentioned on the page that has the Erlang code.<br><br>I was curious so I ran it on my configuration (Ubuntu 8.04 x86_64, 2.4GHz Q6600, 8GB RAM, Erlang R12B-4).<br><br>Here's the bottom line. <br>
<br><b>When I ran it with -smp disable there was a</b> <b>monumentally huge</b> <b>(91.8x) </b><b>performance boost</b> <b>over when I ran it with smp enabled</b>. That was without HiPE. Adding HiPE speeded the SMP version up by about 20% or so. The performance difference between the non-SMP HiPE and non-HiPE versions was about 80x.<br>
<br> # <b>SMP without HiPE<br></b>$ time /usr/local/bin/erl +K true -noshell -run chameneosredux main 6000000<br>
<b>real 11m24.231s<br>
</b>user 20m25.933s<br>
sys 1m49.315s<br><br> # <b>SMP with HiPE<br></b> $ time /usr/local/bin/erl +K true -noshell -run chameneosredux main 6000000<br><b>real 9m19.138s</b><br>user 16m28.374s<br>sys 1m49.899s<br><br># <b>SMP disabled, without HiPE<br>
</b>$ time /usr/local/bin/erl -smp disable +K true -noshell -run chameneosredux main 6000000<br>
<b>real 0m7.451s<br>
</b>user 0m7.404s<br>
sys 0m0.048s<br><br> # <b>SMP disabled, with HiPE<br></b>$ time /usr/local/bin/erl -smp disable +K true -noshell -run chameneosredux main 6000000<br>
<b>real 0m6.970s<br></b>user 0m6.864s<br>sys 0m0.104s<br><br>So if it's not CPU-bound (60% idle), and it's not memory capacity bound (virtual memory usage only 78MB), and it's not disk or network I/O bound, what is it?<br>
<br>A run of vmstat showed a minimum of about <b>15,000 context switches per second</b> and often more. Without the program running, there were <b>only about 500 or so per second</b>.<br><br><span style="font-family: courier new,monospace;">procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----<br>
r b swpd free buff cache si so bi bo in cs us sy id wa<br> </span><span style="font-family: courier new,monospace;">6 0 371656 651900 646988 967940 0 0 0 0 67 15781 44 1 55 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 3 0 371656 651900 646992 967936 0 0 0 4 179 18291 44 2 54 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 1 0 371656 651900 646992 967940 0 0 0 0 61 28605 44 2 54 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 2 0 371656 651892 646992 967940 0 0 0 0 156 15751 42 1 57 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 3 0 371656 651900 646992 967940 0 0 0 0 61 78329 47 6 48 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 3 0 371656 651776 646992 967940 0 0 0 0 156 15081 43 1 56 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 3 0 371656 651776 646992 967940 0 0 0 0 61 15800 44 1 55 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 2 0 371656 651776 646996 967936 0 0 0 4 157 15393 44 1 55 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 2 0 371656 651760 646996 967940 0 0 0 0 62 16251 46 1 53 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 2 0 371656 651776 646996 967940 0 0 0 0 156 35213 43 4 53 0</span><br><br>I ran again with vmstat and -<b>smp disabled</b>. vmstat showed <b>no noticable difference</b> in the cs column <b>when the program was running compared to when it was not</b>:<br>
<br><span style="font-family: courier new,monospace;"><program not running></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 0 0 371656 707156 624840 947328 0 0 0 28 110 453 0 0 100 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 0 0 371656 707148 624840 947328 0 0 0 0 285 775 0 0 99 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 0 0 371656 707140 624840 947328 0 0 0 0 102 537 0 0 100 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 0 0 371656 707148 624840 947328 0 0 0 0 172 597 0 0 100 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 0 0 371656 707148 624840 947328 0 0 0 0 113 532 0 0 100 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 1 0 371656 700460 624840 947328 32 0 32 0 204 1290 24 1 75 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"><program started here></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 1 0 371656 700436 624840 947328 0 0 0 0 71 655 28 0 72 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 1 0 371656 700412 624844 947324 0 0 0 44 302 994 26 0 74 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 1 0 371656 700436 624844 947328 0 0 0 0 98 520 26 0 74 0</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> 1 0 371656 700436 624844 947328 0 0 0 0 156 558 25 0 75 0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> 1 0 371656 700428 624844 947328 0 0 0 0 76 431 25 0 74 0</span><br><br>Tentative conclusion: this benchmark makes SMP Erlang do an excessive number of context switches. Is that because it is jumping between cores, or because of inter-process communication between cores? I can't answer that fully, but I can see what happens if we retrict it to one core using one VM.<br>
<br>And the answer is: <b>vmstat shows that using taskset and +S 1, the context switches go down to about the same level as if you were not running with SMP</b>. It's still about 3x slower than -smp disable, though, but orders of magnitude faster than using all processors with SMP.<br>
<br><b> # SMP, Without HiPE, one scheduler, affinitied to CPU 2</b><br>$ time taskset -c 2 /usr/local/bin/erl +S 1 -noshell -noinput -run chameneosredux main 6000000<br><b>real 0m24.296s</b><br>user 0m24.270s<br>sys 0m0.004s<br>
<br>Last one. What about using +K true to user kernel poll? No significant difference.<br><br><b>real 0m24.006s</b><br>user 0m23.998s<br>sys 0m0.012s<br><br>I tried capturing the output of various runs using strace but it's going to take me a while to interpret the results (if I can even do that) and rerun it until it makes sense. I don't know if it makes sense to try to use strace with Erlang. I'll have to do some Googling.<br>
<br>Regards,<br>Edwin Fine<br><br><div class="gmail_quote">2008/10/10 Kevin Scaldeferri <span dir="ltr"><<a href="mailto:kevin@scaldeferri.com" target="_blank">kevin@scaldeferri.com</a>></span><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div><div></div><div><br><div><div>On Oct 10, 2008, at 12:54 PM, Greg Burri wrote:</div>
<br><blockquote type="cite"><div dir="ltr">Hi,<br>I'm very surprise to see the differences between these two same benchmarks on <a href="http://shootout.alioth.debian.org" target="_blank">shootout.alioth.debian.org</a> :<br>
1) Quad core : <a href="http://shootout.alioth.debian.org/u64q/benchmark.php?test=chameneosredux&lang=hipe" target="_blank">http://shootout.alioth.debian.org/u64q/benchmark.php?test=chameneosredux&lang=hipe</a><br>
2) Mono core : <a href="http://shootout.alioth.debian.org/u64/benchmark.php?test=chameneosredux&lang=hipe" target="_blank">http://shootout.alioth.debian.org/u64/benchmark.php?test=chameneosredux&lang=hipe</a><br>
<br>Here are the CPU times :<br> 1) 2095.18 s<br>2) 37.03 s<br><br>I try on my machine[1] with a) "-smp enable" and b) "-smp disable" :<br>a) 47.863 s<br>b) 18.285<br><br>Maybe It's not strange to see a such difference because of inter-cpu message passing. But the difference on <a href="http://shootout.alioth.debian.org" target="_blank">shootout.alioth.debian.org</a> is too large<br>
What should we do ?<br></div></blockquote></div><br></div></div><div>Are you using HiPE? There's some chance that could explain some of the relative difference.</div><div><br></div><div>I don't think message passing is the issue. I suspect it's lack of process affinity. The chameneos processes are likely getting bounced around between schedulers constantly.</div>
<div><br></div><div>In the short term, I'm not sure what can be done other than requesting the benchmark be run with '-S 1', which really kinda defeats the purpose. It would be nice to have a different solution, as I agree that this situation is pretty embarrassing. This task is such a natural for the actor-model and Erlang; it's a shame the performance ends up being so poor.</div>
<div><br></div><font color="#888888"><div>-k</div></font></div><br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br></blockquote></div><br></div>