[erlang-questions] performance drop in R15A?

Nico Kruber lists.nico.k@REDACTED
Thu Jun 28 15:53:39 CEST 2012


I ran the git bisect with R14B04 as last good and R15A as last bad.
Find the test script attached. It basically compiles and installs Erlang and 
then compiles Scalaris running its benchmarks and evaluating the quorum_read 
bench. On our machine, R14B04 is around 7000 reqs/s, R15A around 5200, R15B01 
around 5000 and current master around 5500.

What you need in order to run this on your machine:
1) checkout otp to $HOME/otp
2) checkout current Scalaris from http://scalaris.googlecode.com/svn/trunk/ to 
$HOME/scalaris-otp
3) store the attached otp-find_sched_bug.sh in $HOME/
4) run the git bisect as follows:

git bisect start OTP_R15A OTP_R14B04
git bisect run $HOME/otp-find_sched_bug.sh
...
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
55358c54778ead444e51f565d00175ba887ef182
a67e91e658bdbba24fcc3c79b06fdf10ff830bc9
c12befbdc957f7f166598c6d5143ce27a0d10fa8
We cannot bisect more!
bisect run cannot continue any more

Results: still good up until 11/11/2011
# good: [4ed347dcaeeb8e8844be532414a11817da3920f5] Merge branch 
'anders/diameter/capabilities_cb/OTP-9654'
bad starting at 13/11/2011
# bad: [c12befbdc957f7f166598c6d5143ce27a0d10fa8] Merge branch 'rickard/alloc-
opt/OTP-7775'

the two commits in between don't seem to compile

to give you an impression of the differences, here are the bench results of 
some of the tested commits:

4ed347dcaeeb8e8844be532414a11817da3920f5 (good)
quorum_read: "wall clock throughput  : 6979.394732929971"

c12befbdc957f7f166598c6d5143ce27a0d10fa8 (bad)
quorum_read: "wall clock throughput  : 5754.395660564197"

ca58731b5df58aa2a8b42c583d1ba7bb929e72b2 (bad)
quorum_read: "wall clock throughput  : 5372.322931190965"

6ff77abac7d54f23e5a5fc465c3746c9cd7ee57b (bad)
quorum_read: "wall clock throughput  : 5271.737230890335"

2ac3f65eeeac88cf2a9a8b339eab1fee3c94e42a (bad)
quorum_read: "wall clock throughput  : 5199.73995060559"

OTP_R15A (bad)
quorum_read: "wall clock throughput  : 5265.768884666444"

now as you can see, most of the performance is lost with commit c12befbdc9 
though further degradations can be seen...
Unfortunately, c12befbdc9 is a very big patch working on the core of otp and 
probably not easy to debug.

So how can we continue from here?


Nico

On Thursday 12 Apr 2012 22:42:05 Gustav Simonsson wrote:
> Hi Nico,
> 
> Have you tried running the benchmarks between git bisects,
> using R14B04 and R15A as endpoints?
> 
> // Gustav
> 
> Sent from my PC
> 
> ----- Original Message -----
> 
> > From: "Nico Kruber" <lists.nico.k@REDACTED>
> > To: erlang-questions@REDACTED
> > Cc: "Lukas Larsson" <lukas@REDACTED>
> > Sent: Thursday, 12 April, 2012 2:47:50 PM
> > Subject: Re: [erlang-questions] performance drop in R15A?
> > 
> > is there any news on this?
> > Our internal test server has a new additional configuration which
> > compiles the
> > latest otp from github once per day and tests its performance with
> > the
> > mentioned benchmarks.
> > The resulting graphs for R14B04, R15B and latest git checkout are
> > attached. As
> > you can see - nothing has changed since then so I suppose it is still
> > present
> > in R15B01 :(
> > 
> > I don't know how to debug this any further. If there is anything I
> > could do to
> > help you identify the issue, please tell me.
> > 
> > 
> > Regards
> > Nico Kruber
> > 
> > On Wednesday 14 December 2011 17:41:33 Nico Kruber wrote:
> > > ok, I did some profiling and made some optimisations but the
> > > differences
> > > between R15B (git checkout from master just now) and R14B04 are
> > > still the
> > > same:
> > > 
> > > First the results of the calls to bench:quorum_read(10, 50000)
> > > (best of
> > > three) ------------------------------------------------------------
> > > 
> > >                    |   R14B04   |   R15B     | time surplus
> > > 
> > > ------------------------------------------------------------
> > > timer:tc, smp8:8   |    22484619|    26641737| +18%
> > > timer:tc, smp4:4   |    34009785|    35963489| + 6%
> > > timer:tc, smp2:2   |    32863428|    34102038| + 4%
> > > timer:tc, smpoff   |    53805419|    54214795| + 1%
> > > ------------------------------------------------------------
> > > 
> > > It seems there is some lock-contention or some other problem with
> > > smp in our
> > > case which hasn't been there before.
> > > I also observed that the results using R15B vary quite much,
> > > especially with
> > > smp8:8 (see below for an example). R14B04 is more stable among
> > > different
> > > test runs.
> > > But still, even without smp, R15B is slower than R14B04 :(
> > > 
> > > Lukas suggested to run fprof to a bit deeper and maybe find the
> > > function
> > > taking longer in R15 than in R14B04.
> > > Although it was helpful in finding some hot spots, it did however
> > > not help
> > > with this problem. Following are the totals reported by fprof
> > > (if you like, I can send the complete logs - didn't want to send
> > > too much to
> > > the mailing list though).
> > > 
> > > ------------------------------------------------------------
> > > 
> > >                    |   R14B04   |   R15B     | time surplus
> > > 
> > > ------------------------------------------------------------
> > > fprof, CNT, smp8:8 |     7723796|     7722635| n/a
> > > fprof, ACC, smp8:8 |       98239|       85477| -13%
> > > fprof, OWN, smp8:8 |      300534|      264018| -12%
> > > ------------------------------------------------------------
> > > 
> > > As you can see, according to fprof, R15B should be faster (but it
> > > isn't!).
> > > I assume that some function calls are not reported by fprof
> > > (e.g. I did not see erlang:length/1).
> > > Is there some way to include them again in order to analyse
> > > further?
> > > Maybe suspend or garbage_collect are not in ACC or OWN?
> > > But even here, R15B wastes less time as you can see in the table
> > > below.
> > > ----------------------------------------------------------
> > > 
> > >                                |   R14B04   |   R15B     |
> > > 
> > > ----------------------------------------------------------
> > > fprof, suspend, smp8:8         |  9.8421e+06| 8.51855e+06|
> > > fprof, garbage_collect, smp8:8 |        1315|         794|
> > > ----------------------------------------------------------
> > > 
> > > We do rely quite much on the process dictionary (erlang:put and
> > > erlang:get)
> > > which you might see as an "non-typical usage pattern" for erlang.
> > > However, if I switch to ets instead, performance of
> > > bench:quorum_read(10,
> > > 50000) for R15B does not change and for R14B04 decreases by about
> > > 7%.
> > > 
> > > Everything else I would consider "typical use" - or at least I'm
> > > not aware
> > > of anything untypical.
> > > 
> > > I hope, someone can help to further analyse the problem. I did
> > > everything I
> > > could think of and are stuck now.
> > > Is other real-world software also affected? E.g. testing couchdb
> > > would be
> > > nice.
> > > 
> > > 
> > > Regards
> > > Nico Kruber
> > > 
> > > 
> > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> > > how to re-produce the values
> > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> > > 
> > > I'm using scalaris from http://code.google.com/p/scalaris/
> > > ./configure && make
> > > ================ R15B ===============
> > > --------------------------------
> > > ./bin/firstnode.sh
> > > Erlang R15B (erts-5.9) [source] [64-bit] [smp:8:8]
> > > [async-threads:0] [hipe]
> > > [kernel-poll:false]
> > > 
> > > timer:tc(bench, quorum_read, [10, 50000]).
> > > {33764932,ok} % 1st try
> > > {26641737,ok} % 2nd try
> > > {27575975,ok} % 3rd try
> > > --------------------------------
> > > ./bin/firstnode.sh "+S4:4"
> > > Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4]
> > > [async-threads:0] [hipe]
> > > [kernel-poll:false]
> > > 
> > > timer:tc(bench, quorum_read, [10, 50000]).
> > > {35963489,ok} % 1st try
> > > {36332156,ok} % 2nd try
> > > {37295819,ok} % 3rd try
> > > --------------------------------
> > > ./bin/firstnode.sh "+S2:2"
> > > Erlang R15B (erts-5.9) [source] [64-bit] [smp:2:2]
> > > [async-threads:0] [hipe]
> > > [kernel-poll:false]
> > > 
> > > timer:tc(bench, quorum_read, [10, 50000]).
> > > {34102038,ok} % 1st try
> > > {34379052,ok} % 2nd try
> > > {34416173,ok} % 3rd try
> > > --------------------------------
> > > ./bin/firstnode.sh "-smp disable"
> > > Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe]
> > > [kernel-poll:false]
> > > 
> > > timer:tc(bench, quorum_read, [10, 50000]).
> > > {54214795,ok} % 1st try
> > > {55210947,ok} % 2nd try
> > > {54537194,ok} % 3rd try
> > > 
> > > ================ R14B04 ===============
> > > --------------------------------
> > > ./bin/firstnode.sh
> > > Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8]
> > > [async-threads:0] [hipe] [kernel-poll:false]
> > > 
> > > timer:tc(bench, quorum_read, [10, 50000]).
> > > {22598819,ok} % 1st try
> > > {22706185,ok} % 2nd try
> > > {22484619,ok} % 3rd try
> > > --------------------------------
> > > ./bin/firstnode.sh "+S4:4"
> > > Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:4:4] [rq:4]
> > > [async-threads:0] [hipe] [kernel-poll:false]
> > > 
> > > timer:tc(bench, quorum_read, [10, 50000]).
> > > {35482368,ok} % 1st try
> > > {34009785,ok} % 2nd try
> > > {34527203,ok} % 3rd try
> > > --------------------------------
> > > ./bin/firstnode.sh "+S2:2"
> > > Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2]
> > > [async-threads:0] [hipe] [kernel-poll:false]
> > > 
> > > timer:tc(bench, quorum_read, [10, 50000]).
> > > {33590356,ok} % 1st try
> > > {33192033,ok} % 2nd try
> > > {32863428,ok} % 3rd try
> > > --------------------------------
> > > ./bin/firstnode.sh "-smp disable"
> > > Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1]
> > > [async-threads:0] [hipe]
> > > [kernel-poll:false]
> > > 
> > > timer:tc(bench, quorum_read, [10, 50000]).
> > > {53805419,ok} % 1st try
> > > {54054538,ok} % 2nd try
> > > {54066521,ok} % 3rd try
> > > 
> > > 
> > > fprof runs:
> > > ./bin/firstnode.sh
> > > 
> > > fprof:trace([start, {procs, processes()}]).
> > > bench:quorum_read(10, 1000).
> > > fprof:trace(stop).
> > > fprof:profile().
> > > fprof:analyse([{dest, "test.R15B.fprof.log"}]). % or R14B04
> > > respectively
> > > 
> > > > cat test.R14B04.fprof.log |grep suspend|cut -d',' -f3 | awk
> > > > '{s+=$1} END
> > > > {print s}'
> > > 
> > > 9.8421e+06
> > > 
> > > > cat test.R15B.fprof.log |grep suspend|cut -d',' -f3 | awk
> > > > '{s+=$1} END
> > > > {print s}'
> > > 
> > > 8.51855e+06
> > > 
> > > > cat test.R14B04.fprof.log |grep garbage_collect|cut -d',' -f3 |
> > > > awk
> > > > '{s+=$1} END {print s}'
> > > 
> > > 1315.33
> > > 
> > > > cat test.R15B.fprof.log |grep garbage_collect|cut -d',' -f3 | awk
> > > > '{s+=$1}
> > > > END {print s}'
> > > 
> > > 794.956
> > > 
> > > On Friday 09 December 2011 12:09:56 Kenneth Lundin wrote:
> > > > On 12/5/11, Nico Kruber <kruber@REDACTED> wrote:
> > > > > Is there any news on this?
> > > > > 
> > > > > The performance drop of the recent R15 checkout compared to
> > > > > R14B04 in
> > > > > our
> > > > > application is now at _30%_ compared to a drop of "only" 15% in
> > > > > R15A!
> > > > 
> > > > When working with R15B we are of course performing a number of
> > > > bencmarks to keep track of the performance.
> > > > 
> > > > Whe always strive to improve general performance in each release
> > > > AND
> > > > OUR BENCHMARKS INDICATE THAT R15B IS FASTER IN GENERAL THAN
> > > > R14B04.
> > > > 
> > > > It is however always possible that some specific operations or
> > > > combinations of operations get slower while others are optimized.
> > > > 
> > > > Regarding the findings for this particular application we think
> > > > the
> > > > reason for slowdown is a very frequent but non typical usage
> > > > pattern
> > > > that now becomes visible as a performance bottleneck.
> > > > 
> > > > 
> > > > / Kenneth, Erlang/OTP Ericsson
> > 
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bisect.log
Type: text/x-log
Size: 2662 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120628/c3f3754d/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: otp-find_sched_bug.sh
Type: application/x-shellscript
Size: 1398 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120628/c3f3754d/attachment-0001.bin>


More information about the erlang-questions mailing list