[erlang-questions] performance drop in R15A?

Thu Apr 12 15:36:31 CEST 2012

Hi Nico,

According to erts documetation, since OTP R15B, the virtual machine
doesn't bind the schedulers to logical processors.
I don't know if it's the reason of your performance drop but it might
worst looking around the +sbt option.

Regards,
Nicolas

> is there any news on this?
> Our internal test server has a new additional configuration which compiles
> the
> latest otp from github once per day and tests its performance with the
> mentioned benchmarks.
> The resulting graphs for R14B04, R15B and latest git checkout are
> attached. As
> you can see - nothing has changed since then so I suppose it is still
> present
> in R15B01 :(
>
> I don't know how to debug this any further. If there is anything I could
> do to
> help you identify the issue, please tell me.
>
>
> Regards
> Nico Kruber
>
> On Wednesday 14 December 2011 17:41:33 Nico Kruber wrote:
>> ok, I did some profiling and made some optimisations but the differences
>> between R15B (git checkout from master just now) and R14B04 are still
>> the
>> same:
>>
>> First the results of the calls to bench:quorum_read(10, 50000) (best of
>> three) ------------------------------------------------------------
>>
>>                    |   R14B04   |   R15B     | time surplus
>>
>> ------------------------------------------------------------
>> timer:tc, smp8:8   |    22484619|    26641737| +18%
>> timer:tc, smp4:4   |    34009785|    35963489| + 6%
>> timer:tc, smp2:2   |    32863428|    34102038| + 4%
>> timer:tc, smpoff   |    53805419|    54214795| + 1%
>> ------------------------------------------------------------
>>
>> It seems there is some lock-contention or some other problem with smp in
>> our
>> case which hasn't been there before.
>> I also observed that the results using R15B vary quite much, especially
>> with
>> smp8:8 (see below for an example). R14B04 is more stable among different
>> test runs.
>> But still, even without smp, R15B is slower than R14B04 :(
>>
>> Lukas suggested to run fprof to a bit deeper and maybe find the function
>> taking longer in R15 than in R14B04.
>> Although it was helpful in finding some hot spots, it did however not
>> help
>> with this problem. Following are the totals reported by fprof
>> (if you like, I can send the complete logs - didn't want to send too
>> much to
>> the mailing list though).
>>
>> ------------------------------------------------------------
>>
>>                    |   R14B04   |   R15B     | time surplus
>>
>> ------------------------------------------------------------
>> fprof, CNT, smp8:8 |     7723796|     7722635| n/a
>> fprof, ACC, smp8:8 |       98239|       85477| -13%
>> fprof, OWN, smp8:8 |      300534|      264018| -12%
>> ------------------------------------------------------------
>>
>> As you can see, according to fprof, R15B should be faster (but it
>> isn't!).
>> I assume that some function calls are not reported by fprof
>> (e.g. I did not see erlang:length/1).
>> Is there some way to include them again in order to analyse further?
>> Maybe suspend or garbage_collect are not in ACC or OWN?
>> But even here, R15B wastes less time as you can see in the table below.
>> ----------------------------------------------------------
>>
>>                                |   R14B04   |   R15B     |
>>
>> ----------------------------------------------------------
>> fprof, suspend, smp8:8         |  9.8421e+06| 8.51855e+06|
>> fprof, garbage_collect, smp8:8 |        1315|         794|
>> ----------------------------------------------------------
>>
>> We do rely quite much on the process dictionary (erlang:put and
>> erlang:get)
>> which you might see as an "non-typical usage pattern" for erlang.
>> However, if I switch to ets instead, performance of
>> bench:quorum_read(10,
>> 50000) for R15B does not change and for R14B04 decreases by about 7%.
>>
>> Everything else I would consider "typical use" - or at least I'm not
>> aware
>> of anything untypical.
>>
>> I hope, someone can help to further analyse the problem. I did
>> everything I
>> could think of and are stuck now.
>> Is other real-world software also affected? E.g. testing couchdb would
>> be
>> nice.
>>
>>
>> Regards
>> Nico Kruber
>>
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> how to re-produce the values
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>
>> I'm using scalaris from http://code.google.com/p/scalaris/
>> ./configure && make
>> ================ R15B ===============
>> --------------------------------
>> ./bin/firstnode.sh
>> Erlang R15B (erts-5.9) [source] [64-bit] [smp:8:8] [async-threads:0]
>> [hipe]
>> [kernel-poll:false]
>>
>> timer:tc(bench, quorum_read, [10, 50000]).
>> {33764932,ok} % 1st try
>> {26641737,ok} % 2nd try
>> {27575975,ok} % 3rd try
>> --------------------------------
>> ./bin/firstnode.sh "+S4:4"
>> Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4] [async-threads:0]
>> [hipe]
>> [kernel-poll:false]
>>
>> timer:tc(bench, quorum_read, [10, 50000]).
>> {35963489,ok} % 1st try
>> {36332156,ok} % 2nd try
>> {37295819,ok} % 3rd try
>> --------------------------------
>> ./bin/firstnode.sh "+S2:2"
>> Erlang R15B (erts-5.9) [source] [64-bit] [smp:2:2] [async-threads:0]
>> [hipe]
>> [kernel-poll:false]
>>
>> timer:tc(bench, quorum_read, [10, 50000]).
>> {34102038,ok} % 1st try
>> {34379052,ok} % 2nd try
>> {34416173,ok} % 3rd try
>> --------------------------------
>> ./bin/firstnode.sh "-smp disable"
>> Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe]
>> [kernel-poll:false]
>>
>> timer:tc(bench, quorum_read, [10, 50000]).
>> {54214795,ok} % 1st try
>> {55210947,ok} % 2nd try
>> {54537194,ok} % 3rd try
>>
>> ================ R14B04 ===============
>> --------------------------------
>> ./bin/firstnode.sh
>> Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8]
>> [async-threads:0] [hipe] [kernel-poll:false]
>>
>> timer:tc(bench, quorum_read, [10, 50000]).
>> {22598819,ok} % 1st try
>> {22706185,ok} % 2nd try
>> {22484619,ok} % 3rd try
>> --------------------------------
>> ./bin/firstnode.sh "+S4:4"
>> Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:4:4] [rq:4]
>> [async-threads:0] [hipe] [kernel-poll:false]
>>
>> timer:tc(bench, quorum_read, [10, 50000]).
>> {35482368,ok} % 1st try
>> {34009785,ok} % 2nd try
>> {34527203,ok} % 3rd try
>> --------------------------------
>> ./bin/firstnode.sh "+S2:2"
>> Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2]
>> [async-threads:0] [hipe] [kernel-poll:false]
>>
>> timer:tc(bench, quorum_read, [10, 50000]).
>> {33590356,ok} % 1st try
>> {33192033,ok} % 2nd try
>> {32863428,ok} % 3rd try
>> --------------------------------
>> ./bin/firstnode.sh "-smp disable"
>> Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:0]
>> [hipe]
>> [kernel-poll:false]
>>
>> timer:tc(bench, quorum_read, [10, 50000]).
>> {53805419,ok} % 1st try
>> {54054538,ok} % 2nd try
>> {54066521,ok} % 3rd try
>>
>>
>> fprof runs:
>> ./bin/firstnode.sh
>>
>> fprof:trace([start, {procs, processes()}]).
>> bench:quorum_read(10, 1000).
>> fprof:trace(stop).
>> fprof:profile().
>> fprof:analyse([{dest, "test.R15B.fprof.log"}]). % or R14B04 respectively
>>
>> > cat test.R14B04.fprof.log |grep suspend|cut -d',' -f3 | awk '{s+=$1}
>> END
>> > {print s}'
>> 9.8421e+06
>>
>> > cat test.R15B.fprof.log |grep suspend|cut -d',' -f3 | awk '{s+=$1} END
>> > {print s}'
>> 8.51855e+06
>>
>> > cat test.R14B04.fprof.log |grep garbage_collect|cut -d',' -f3 | awk
>> > '{s+=$1} END {print s}'
>> 1315.33
>>
>> > cat test.R15B.fprof.log |grep garbage_collect|cut -d',' -f3 | awk
>> '{s+=$1}
>> > END {print s}'
>> 794.956
>>
>> On Friday 09 December 2011 12:09:56 Kenneth Lundin wrote:
>> > On 12/5/11, Nico Kruber <kruber@REDACTED> wrote:
>> > > Is there any news on this?
>> > >
>> > > The performance drop of the recent R15 checkout compared to R14B04
>> in
>> > > our
>> > > application is now at _30%_ compared to a drop of "only" 15% in
>> R15A!
>> >
>> > When working with R15B we are of course performing a number of
>> > bencmarks to keep track of the performance.
>> >
>> > Whe always strive to improve general performance in each release AND
>> > OUR BENCHMARKS INDICATE THAT R15B IS FASTER IN GENERAL THAN R14B04.
>> >
>> > It is however always possible that some specific operations or
>> > combinations of operations get slower while others are optimized.
>> >
>> > Regarding the findings for this particular application we think the
>> > reason for slowdown is a very frequent but non typical usage pattern
>> > that now becomes visible as a performance bottleneck.
>> >
>> >
>> > / Kenneth, Erlang/OTP
>> Ericsson_______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>