[erlang-questions] Improve performance of IO bounded server written in Erlang via having pollset for each scheduler and bind port to scheduler together with process
Zabrane Mickael
zabrane3@REDACTED
Thu Jul 12 13:01:47 CEST 2012
Hi,
Good news. With the new (today) patch:
old bench: ~70K rps
new bench: ~85K rps
More than 15K rps handled now !!
We're not far from the 100K rps ;-)
Well done Wei.
Regards,
Zabrane
On Jul 12, 2012, at 11:58 AM, Wei Cao wrote:
> 2012/7/12 Zabrane Mickael <zabrane3@REDACTED>:
>> Hi Wei,
>>
>>>> We already surpassed the 100krps on an 8-cores machine with our HTTP server
>>>> (~150K rps).
>>>
>>> Which erlang version did you use to get ~150k rps on 8-cores machine,
>>> patched or unpatched?
>>
>> We reach the 150K on the unpatched version.
>>
>>
>>> if it was measured on a unpatched erlang
>>> version, would you mind measuring it on the patched version and let me
>>> know the result?
>>
>> I didn't yet adapted our code to use VM with your patch.
>> I'll keep you informed.
>>
>>> Today I found a lock bottleneck through SystemTap, trace-cmd and lcnt,
>>> after fixing it, ehttpd on my 16-cores can reach 325k rps.
>>>
>>> RX packets: 326117 TX packets: 326122
>>> RX packets: 326845 TX packets: 326859
>>> RX packets: 327983 TX packets: 327996
>>> RX packets: 326651 TX packets: 326624
>>>
>>> This is the upper limit of our Gigabit network card, I run ab on three
>>> standalone machines to make enough pressure, I posted the fix to
>>> github, have a try ~
>>
>> That's simply fantastic. Could you share your bottleneck tracking method?
>> Any new VM patch to provide?
>
> through perf top, I see there is a big percentage of time is wasted in
> kernel _spin_lock
>
> 1894.00 16.0% _spin_lock
> /usr/lib/debug/lib/modules/2.6.32-131.21.1.tb477.el6.x86_64/vmlinux
> 566.00 4.8% process_main
> /home/mingsong.cw/erlangpps/lib/erlang/erts-5.10/bin/beam.smp
>
> After dumping and doing a statisics of _spin_lock's call stack via
> trace-cmd, I found most of _spin_lock is called by futex_wake, which
> is invoked by pthread mutex.
>
> Finally, I use lcnt to locate all lock collisions in erlang VM, found
> the mutex timeofday is the bottleneck.
>
> lock
> location #tries #collisions collisions [%] time
> [us] duration [%]
>
> ----- --------- ------- ------------
> --------------- ---------- -------------
>
> timeofday 'beam/erl_time_sup.c':939 895234 551957
> 61.6551 3185159 23.5296
>
> timeofday 'beam/erl_time_sup.c':971 408006 264498
> 64.8270 1473816 10.8874
>
>
> the mutex timeofday is locked each time erts_check_io is invoked to
> "sync the machine's idea of time", erts_check_io is executed hundreds
> of thounds of times per second, so it's locked too much times, hence
> reduce performance.
>
> I solved this problem by moving the sync operation into a standalone
> thread, invoked 1 time per millisecond
>
>
>
>>
>> Regards,
>> Zabrane
>>
>
>
>
> --
>
> Best,
>
> Wei Cao
More information about the erlang-questions
mailing list