[erlang-questions] Improve performance of IO bounded server written in Erlang via having pollset for each scheduler and bind port to scheduler together with process
Wei Cao
cyg.cao@REDACTED
Thu Jul 12 11:58:37 CEST 2012
2012/7/12 Zabrane Mickael <zabrane3@REDACTED>:
> Hi Wei,
>
>>> We already surpassed the 100krps on an 8-cores machine with our HTTP server
>>> (~150K rps).
>>
>> Which erlang version did you use to get ~150k rps on 8-cores machine,
>> patched or unpatched?
>
> We reach the 150K on the unpatched version.
>
>
>> if it was measured on a unpatched erlang
>> version, would you mind measuring it on the patched version and let me
>> know the result?
>
> I didn't yet adapted our code to use VM with your patch.
> I'll keep you informed.
>
>> Today I found a lock bottleneck through SystemTap, trace-cmd and lcnt,
>> after fixing it, ehttpd on my 16-cores can reach 325k rps.
>>
>> RX packets: 326117 TX packets: 326122
>> RX packets: 326845 TX packets: 326859
>> RX packets: 327983 TX packets: 327996
>> RX packets: 326651 TX packets: 326624
>>
>> This is the upper limit of our Gigabit network card, I run ab on three
>> standalone machines to make enough pressure, I posted the fix to
>> github, have a try ~
>
> That's simply fantastic. Could you share your bottleneck tracking method?
> Any new VM patch to provide?
through perf top, I see there is a big percentage of time is wasted in
kernel _spin_lock
1894.00 16.0% _spin_lock
/usr/lib/debug/lib/modules/2.6.32-131.21.1.tb477.el6.x86_64/vmlinux
566.00 4.8% process_main
/home/mingsong.cw/erlangpps/lib/erlang/erts-5.10/bin/beam.smp
After dumping and doing a statisics of _spin_lock's call stack via
trace-cmd, I found most of _spin_lock is called by futex_wake, which
is invoked by pthread mutex.
Finally, I use lcnt to locate all lock collisions in erlang VM, found
the mutex timeofday is the bottleneck.
lock
location #tries #collisions collisions [%] time
[us] duration [%]
----- --------- ------- ------------
--------------- ---------- -------------
timeofday 'beam/erl_time_sup.c':939 895234 551957
61.6551 3185159 23.5296
timeofday 'beam/erl_time_sup.c':971 408006 264498
64.8270 1473816 10.8874
the mutex timeofday is locked each time erts_check_io is invoked to
"sync the machine's idea of time", erts_check_io is executed hundreds
of thounds of times per second, so it's locked too much times, hence
reduce performance.
I solved this problem by moving the sync operation into a standalone
thread, invoked 1 time per millisecond
>
> Regards,
> Zabrane
>
--
Best,
Wei Cao
More information about the erlang-questions
mailing list