[erlang-questions] performance drop in R15A?

Nico Kruber kruber@REDACTED
Wed Dec 14 17:41:33 CET 2011


ok, I did some profiling and made some optimisations but the differences
between R15B (git checkout from master just now) and R14B04 are still the same:

First the results of the calls to bench:quorum_read(10, 50000) (best of three)
------------------------------------------------------------
                   |   R14B04   |   R15B     | time surplus
------------------------------------------------------------
timer:tc, smp8:8   |    22484619|    26641737| +18%
timer:tc, smp4:4   |    34009785|    35963489| + 6%
timer:tc, smp2:2   |    32863428|    34102038| + 4%
timer:tc, smpoff   |    53805419|    54214795| + 1%
------------------------------------------------------------

It seems there is some lock-contention or some other problem with smp in our
case which hasn't been there before.
I also observed that the results using R15B vary quite much, especially with
smp8:8 (see below for an example). R14B04 is more stable among different test
runs.
But still, even without smp, R15B is slower than R14B04 :(

Lukas suggested to run fprof to a bit deeper and maybe find the function taking
longer in R15 than in R14B04.
Although it was helpful in finding some hot spots, it did however not help with
this problem. Following are the totals reported by fprof
(if you like, I can send the complete logs - didn't want to send too much to
the mailing list though).

------------------------------------------------------------
                   |   R14B04   |   R15B     | time surplus
------------------------------------------------------------
fprof, CNT, smp8:8 |     7723796|     7722635| n/a
fprof, ACC, smp8:8 |       98239|       85477| -13%
fprof, OWN, smp8:8 |      300534|      264018| -12%
------------------------------------------------------------

As you can see, according to fprof, R15B should be faster (but it isn't!).
I assume that some function calls are not reported by fprof
(e.g. I did not see erlang:length/1).
Is there some way to include them again in order to analyse further?
Maybe suspend or garbage_collect are not in ACC or OWN?
But even here, R15B wastes less time as you can see in the table below.
----------------------------------------------------------
                               |   R14B04   |   R15B     |
----------------------------------------------------------
fprof, suspend, smp8:8         |  9.8421e+06| 8.51855e+06|
fprof, garbage_collect, smp8:8 |        1315|         794|
----------------------------------------------------------

We do rely quite much on the process dictionary (erlang:put and erlang:get)
which you might see as an "non-typical usage pattern" for erlang.
However, if I switch to ets instead, performance of bench:quorum_read(10, 50000)
for R15B does not change and for R14B04 decreases by about 7%.

Everything else I would consider "typical use" - or at least I'm not aware
of anything untypical.

I hope, someone can help to further analyse the problem. I did everything I
could think of and are stuck now.
Is other real-world software also affected? E.g. testing couchdb would be nice.


Regards
Nico Kruber


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
how to re-produce the values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I'm using scalaris from http://code.google.com/p/scalaris/
./configure && make
================ R15B ===============
--------------------------------
./bin/firstnode.sh
Erlang R15B (erts-5.9) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false]

timer:tc(bench, quorum_read, [10, 50000]).
{33764932,ok} % 1st try
{26641737,ok} % 2nd try
{27575975,ok} % 3rd try
--------------------------------
./bin/firstnode.sh "+S4:4"
Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false]

timer:tc(bench, quorum_read, [10, 50000]).
{35963489,ok} % 1st try
{36332156,ok} % 2nd try
{37295819,ok} % 3rd try
--------------------------------
./bin/firstnode.sh "+S2:2"
Erlang R15B (erts-5.9) [source] [64-bit] [smp:2:2] [async-threads:0] [hipe] [kernel-poll:false]

timer:tc(bench, quorum_read, [10, 50000]).
{34102038,ok} % 1st try
{34379052,ok} % 2nd try
{34416173,ok} % 3rd try
--------------------------------
./bin/firstnode.sh "-smp disable"
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

timer:tc(bench, quorum_read, [10, 50000]).
{54214795,ok} % 1st try
{55210947,ok} % 2nd try
{54537194,ok} % 3rd try

================ R14B04 ===============
--------------------------------
./bin/firstnode.sh
Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:false]

timer:tc(bench, quorum_read, [10, 50000]).
{22598819,ok} % 1st try
{22706185,ok} % 2nd try
{22484619,ok} % 3rd try
--------------------------------
./bin/firstnode.sh "+S4:4"
Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

timer:tc(bench, quorum_read, [10, 50000]).
{35482368,ok} % 1st try
{34009785,ok} % 2nd try
{34527203,ok} % 3rd try
--------------------------------
./bin/firstnode.sh "+S2:2"
Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

timer:tc(bench, quorum_read, [10, 50000]).
{33590356,ok} % 1st try
{33192033,ok} % 2nd try
{32863428,ok} % 3rd try
--------------------------------
./bin/firstnode.sh "-smp disable"
Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:0] [hipe] [kernel-poll:false]

timer:tc(bench, quorum_read, [10, 50000]).
{53805419,ok} % 1st try
{54054538,ok} % 2nd try
{54066521,ok} % 3rd try


fprof runs:
./bin/firstnode.sh

fprof:trace([start, {procs, processes()}]).
bench:quorum_read(10, 1000).
fprof:trace(stop).
fprof:profile().
fprof:analyse([{dest, "test.R15B.fprof.log"}]). % or R14B04 respectively

> cat test.R14B04.fprof.log |grep suspend|cut -d',' -f3 | awk '{s+=$1} END {print s}'
9.8421e+06
> cat test.R15B.fprof.log |grep suspend|cut -d',' -f3 | awk '{s+=$1} END {print s}'
8.51855e+06
> cat test.R14B04.fprof.log |grep garbage_collect|cut -d',' -f3 | awk '{s+=$1} END {print s}'
1315.33
> cat test.R15B.fprof.log |grep garbage_collect|cut -d',' -f3 | awk '{s+=$1} END {print s}'
794.956

On Friday 09 December 2011 12:09:56 Kenneth Lundin wrote:
> On 12/5/11, Nico Kruber <kruber@REDACTED> wrote:
> > Is there any news on this?
> > 
> > The performance drop of the recent R15 checkout compared to R14B04 in
> > our
> > application is now at _30%_ compared to a drop of "only" 15% in R15A!
> 
> When working with R15B we are of course performing a number of
> bencmarks to keep track of the performance.
> 
> Whe always strive to improve general performance in each release AND
> OUR BENCHMARKS INDICATE THAT R15B IS FASTER IN GENERAL THAN R14B04.
> 
> It is however always possible that some specific operations or
> combinations of operations get slower while others are optimized.
> 
> Regarding the findings for this particular application we think the
> reason for slowdown is a very frequent but non typical usage pattern
> that now becomes visible as a performance bottleneck.
> 
> 
> / Kenneth, Erlang/OTP Ericsson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111214/9cb50c4f/attachment.bin>


More information about the erlang-questions mailing list