[erlang-questions] Chameneos.rednux micro benchmark

Sat Oct 11 01:19:14 CEST 2008

It was run with HiPE. It's mentioned on the page that has the Erlang code.

I was curious so I ran it on my configuration (Ubuntu 8.04 x86_64, 2.4GHz
Q6600, 8GB RAM, Erlang R12B-4).

Here's the bottom line.

*When I ran it with -smp disable there was a* *monumentally huge* *(91.8x) *
*performance boost* *over when I ran it with smp enabled*. That was without
HiPE. Adding HiPE speeded the SMP version up by about 20% or so. The
performance difference between the non-SMP HiPE and non-HiPE versions was
about 80x.

# *SMP without HiPE
*$ time /usr/local/bin/erl +K true -noshell -run chameneosredux main 6000000
*real    11m24.231s
*user    20m25.933s
sys    1m49.315s

# *SMP with HiPE
* $ time /usr/local/bin/erl +K true -noshell -run chameneosredux main
6000000
*real    9m19.138s*
user    16m28.374s
sys    1m49.899s

# *SMP disabled, without HiPE
*$ time /usr/local/bin/erl -smp disable +K true -noshell -run chameneosredux
main 6000000
*real    0m7.451s
*user    0m7.404s
sys    0m0.048s

# *SMP disabled, with HiPE
*$ time /usr/local/bin/erl -smp disable +K true -noshell -run chameneosredux
main 6000000
*real    0m6.970s
*user    0m6.864s
sys    0m0.104s

So if it's not CPU-bound (60% idle), and it's not memory capacity bound
(virtual memory usage only 78MB), and it's not disk or network I/O bound,
what is it?

A run of vmstat showed a minimum of about *15,000 context switches per
second* and often more. Without the program running, there were *only about
500 or so per second*.

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 6  0 371656 651900 646988 967940    0    0     0     0   67 15781 44  1 55
0
 3  0 371656 651900 646992 967936    0    0     0     4  179 18291 44  2 54
0
 1  0 371656 651900 646992 967940    0    0     0     0   61 28605 44  2 54
0
 2  0 371656 651892 646992 967940    0    0     0     0  156 15751 42  1 57
0
 3  0 371656 651900 646992 967940    0    0     0     0   61 78329 47  6 48
0
 3  0 371656 651776 646992 967940    0    0     0     0  156 15081 43  1 56
0
 3  0 371656 651776 646992 967940    0    0     0     0   61 15800 44  1 55
0
 2  0 371656 651776 646996 967936    0    0     0     4  157 15393 44  1 55
0
 2  0 371656 651760 646996 967940    0    0     0     0   62 16251 46  1 53
0
 2  0 371656 651776 646996 967940    0    0     0     0  156 35213 43  4 53
0

I ran again with vmstat and  -*smp disabled*. vmstat showed *no noticable
difference* in the cs column *when the program was running compared to when
it was not*:

<program not running>
 0  0 371656 707156 624840 947328    0    0     0    28  110  453  0  0 100
0
 0  0 371656 707148 624840 947328    0    0     0     0  285  775  0  0 99
0
 0  0 371656 707140 624840 947328    0    0     0     0  102  537  0  0 100
0
 0  0 371656 707148 624840 947328    0    0     0     0  172  597  0  0 100
0
 0  0 371656 707148 624840 947328    0    0     0     0  113  532  0  0 100
0
 1  0 371656 700460 624840 947328   32    0    32     0  204 1290 24  1 75
0
<program started here>
 1  0 371656 700436 624840 947328    0    0     0     0   71  655 28  0 72
0
 1  0 371656 700412 624844 947324    0    0     0    44  302  994 26  0 74
0
 1  0 371656 700436 624844 947328    0    0     0     0   98  520 26  0 74
0
 1  0 371656 700436 624844 947328    0    0     0     0  156  558 25  0 75
0
 1  0 371656 700428 624844 947328    0    0     0     0   76  431 25  0 74
0

Tentative conclusion: this benchmark makes SMP Erlang do an excessive number
of context switches. Is that because it is jumping between cores, or because
of inter-process communication between cores? I can't answer that fully, but
I can see what happens if we retrict it to one core using one VM.

And the answer is: *vmstat shows that using taskset and +S 1, the context
switches go down to about the same level as if you were not running with SMP
*. It's still about 3x slower than -smp disable, though, but orders of
magnitude faster than using all processors with SMP.

* # SMP, Without HiPE, one scheduler, affinitied to CPU 2*
$ time taskset -c 2 /usr/local/bin/erl +S 1 -noshell -noinput -run
chameneosredux main 6000000
*real    0m24.296s*
user    0m24.270s
sys    0m0.004s

Last one. What about using +K true to user kernel poll? No significant
difference.

*real    0m24.006s*
user    0m23.998s
sys    0m0.012s

I tried capturing the output of various runs using strace but it's going to
take me a while to interpret the results (if I can even do that) and rerun
it until it makes sense. I don't know if it makes sense to try to use strace
with Erlang. I'll have to do some Googling.

Regards,
Edwin Fine

2008/10/10 Kevin Scaldeferri <kevin@REDACTED>

>
> On Oct 10, 2008, at 12:54 PM, Greg Burri wrote:
>
> Hi,
> I'm very surprise to see the differences between these two same benchmarks
> on shootout.alioth.debian.org :
> 1) Quad core :
> http://shootout.alioth.debian.org/u64q/benchmark.php?test=chameneosredux&lang=hipe
> 2) Mono core :
> http://shootout.alioth.debian.org/u64/benchmark.php?test=chameneosredux&lang=hipe
>
> Here are the CPU times :
> 1) 2095.18 s
> 2) 37.03 s
>
> I try on my machine[1] with a) "-smp enable" and b) "-smp disable" :
> a) 47.863 s
> b) 18.285
>
> Maybe It's not strange to see a such difference because of inter-cpu
> message passing. But the difference on shootout.alioth.debian.org is too
> large
> What should we do ?
>
>
> Are you using HiPE?  There's some chance that could explain some of the
> relative difference.
>
> I don't think message passing is the issue.  I suspect it's lack of process
> affinity.  The chameneos processes are likely getting bounced around between
> schedulers constantly.
>
> In the short term, I'm not sure what can be done other than requesting the
> benchmark be run with '-S 1', which really kinda defeats the purpose.  It
> would be nice to have a different solution, as I agree that this situation
> is pretty embarrassing.  This task is such a natural for the actor-model and
> Erlang; it's a shame the performance ends up being so poor.
>
> -k
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20081010/076528b9/attachment.htm>