[erlang-questions] Improve performance of IO bounded server written in Erlang via having pollset for each scheduler and bind port to scheduler together with process
Thu Jul 12 13:01:47 CEST 2012
Good news. With the new (today) patch:
old bench: ~70K rps
new bench: ~85K rps
More than 15K rps handled now !!
We're not far from the 100K rps ;-)
Well done Wei.
On Jul 12, 2012, at 11:58 AM, Wei Cao wrote:
> 2012/7/12 Zabrane Mickael <>:
>> Hi Wei,
>>>> We already surpassed the 100krps on an 8-cores machine with our HTTP server
>>>> (~150K rps).
>>> Which erlang version did you use to get ~150k rps on 8-cores machine,
>>> patched or unpatched?
>> We reach the 150K on the unpatched version.
>>> if it was measured on a unpatched erlang
>>> version, would you mind measuring it on the patched version and let me
>>> know the result?
>> I didn't yet adapted our code to use VM with your patch.
>> I'll keep you informed.
>>> Today I found a lock bottleneck through SystemTap, trace-cmd and lcnt,
>>> after fixing it, ehttpd on my 16-cores can reach 325k rps.
>>> RX packets: 326117 TX packets: 326122
>>> RX packets: 326845 TX packets: 326859
>>> RX packets: 327983 TX packets: 327996
>>> RX packets: 326651 TX packets: 326624
>>> This is the upper limit of our Gigabit network card, I run ab on three
>>> standalone machines to make enough pressure, I posted the fix to
>>> github, have a try ~
>> That's simply fantastic. Could you share your bottleneck tracking method?
>> Any new VM patch to provide?
> through perf top, I see there is a big percentage of time is wasted in
> kernel _spin_lock
> 1894.00 16.0% _spin_lock
> 566.00 4.8% process_main
> After dumping and doing a statisics of _spin_lock's call stack via
> trace-cmd, I found most of _spin_lock is called by futex_wake, which
> is invoked by pthread mutex.
> Finally, I use lcnt to locate all lock collisions in erlang VM, found
> the mutex timeofday is the bottleneck.
> location #tries #collisions collisions [%] time
> [us] duration [%]
> ----- --------- ------- ------------
> --------------- ---------- -------------
> timeofday 'beam/erl_time_sup.c':939 895234 551957
> 61.6551 3185159 23.5296
> timeofday 'beam/erl_time_sup.c':971 408006 264498
> 64.8270 1473816 10.8874
> the mutex timeofday is locked each time erts_check_io is invoked to
> "sync the machine's idea of time", erts_check_io is executed hundreds
> of thounds of times per second, so it's locked too much times, hence
> reduce performance.
> I solved this problem by moving the sync operation into a standalone
> thread, invoked 1 time per millisecond
> Wei Cao
More information about the erlang-questions