[erlang-questions] Improve performance of IO bounded server written in Erlang via having pollset for each scheduler and bind port to scheduler together with process

Wei Cao cyg.cao@REDACTED
Thu Jul 12 11:58:37 CEST 2012


2012/7/12 Zabrane Mickael <zabrane3@REDACTED>:
> Hi Wei,
>
>>> We already surpassed the 100krps on an 8-cores machine with our HTTP server
>>> (~150K rps).
>>
>> Which erlang version did you use to get ~150k rps on 8-cores machine,
>> patched or unpatched?
>
> We reach the 150K on the unpatched version.
>
>
>> if it was measured on a unpatched erlang
>> version, would you mind measuring it on the patched version and let me
>> know the result?
>
> I didn't yet adapted our code to use VM with your patch.
> I'll keep you informed.
>
>> Today I found a lock bottleneck through SystemTap, trace-cmd and lcnt,
>> after fixing it, ehttpd on my 16-cores can reach 325k rps.
>>
>> RX packets: 326117 TX packets: 326122
>> RX packets: 326845 TX packets: 326859
>> RX packets: 327983 TX packets: 327996
>> RX packets: 326651 TX packets: 326624
>>
>> This is the upper limit of our Gigabit network card, I run ab on three
>> standalone machines to make enough pressure, I posted the fix to
>> github, have a try ~
>
> That's simply fantastic. Could you share your bottleneck tracking method?
> Any new VM patch to provide?

through perf top, I see there is a big percentage of time is wasted in
kernel _spin_lock

             1894.00 16.0% _spin_lock
/usr/lib/debug/lib/modules/2.6.32-131.21.1.tb477.el6.x86_64/vmlinux
              566.00  4.8% process_main
/home/mingsong.cw/erlangpps/lib/erlang/erts-5.10/bin/beam.smp

After dumping and doing a statisics of _spin_lock's call stack via
trace-cmd,  I found most of _spin_lock is called by futex_wake, which
is invoked by pthread mutex.

Finally, I use lcnt to locate all lock collisions in erlang VM, found
the mutex timeofday is the bottleneck.

                                                          lock
                location  #tries  #collisions  collisions [%]  time
[us]  duration [%]

    -----                        --------- ------- ------------
--------------- ---------- -------------

timeofday        'beam/erl_time_sup.c':939  895234       551957
 61.6551    3185159       23.5296

timeofday        'beam/erl_time_sup.c':971  408006       264498
 64.8270    1473816       10.8874


the mutex timeofday is locked each time erts_check_io is invoked to
"sync the machine's idea of time", erts_check_io is executed hundreds
of thounds of times per second, so it's locked too much times, hence
reduce performance.

I solved this problem by moving the sync operation into a standalone
thread, invoked 1 time per millisecond



>
> Regards,
> Zabrane
>



-- 

Best,

Wei Cao



More information about the erlang-questions mailing list