[erlang-questions] Inter-node communication bottleneck

Thu Aug 21 05:28:06 CEST 2014

Hi,
I think it's Ethernet Card's performance problem, not erlang vm.
What kind of Ethernet card used? loopback and real interface are different.
Very high spec ethernet card is required for the 1000k packets per second.

2014-08-21 1:01 GMT+09:00 Jihyun Yu <yjh0502@REDACTED>:
> First, I assume that elang VM itself is scalable because there is no
> problem when I ran benchmark on single Erlang instance. During
> benchmark, Erlang instance couldn't saturate CPU while it had enough
> number of threads and erlang processes. Bandwidth also couldn't be a
> problem because the result can be reproducible on single machine with
> loopback interface, which is not a bottleneck at least on ~100Mbps
> throughput.
>
> So I thought that lock is a problem. I profiled erlang VM with
> systemtap, collect userspace stacktraces when lock collides. It showed
> two frequently colliding locks: one is from schedular_wait(), another
> one is from erts_dsig_send_msg(). schedular_wait() couldn't be a problem
> because benchmark works as expected without inter-node communication.
>
>
> Here's example stacktrace I got during experiment.
>
> __lll_lock_wait+0x1d/0x30 [/usr/lib64/libpthread-2.17.so]
> _L_lock_790+0xf/0x1b [/usr/lib64/libpthread-2.17.so]
> __pthread_mutex_lock+0x37/0x122 [/usr/lib64/libpthread-2.17.so]
> erts_dsig_send_msg+0x3c5/0x640 [erts-6.1/bin/beam.smp]
> remote_send+0xc7/0x220 [erts-6.1/bin/beam.smp]
> erl_send+0x581/0xa60 [erts-6.1/bin/beam.smp]
> process_main+0x80f2/0xba80 [erts-6.1/bin/beam.smp]
> sched_thread_func+0xe2/0x1d0 [erts-6.1/bin/beam.smp]
> thr_wrapper+0x65/0xb0 [erts-6.1/bin/beam.smp]
> start_thread+0xc3/0x310 [/usr/lib64/libpthread-2.17.so]
> clone+0x6d/0x90 [/usr/lib64/libc-2.17.so]
>
>
> The experiment is done on 2-cpu 12-core HP machine with CentOS 7.
> I attached systemtap script and a result.
>
>
> On Wed, Aug 20, 2014 at 04:29:19PM +0200, Sverker Eriksson wrote:
>> On 08/19/2014 02:59 PM, Jihyun Yu wrote:
>> >Hi,
>> >
>> >There is a prior discusson[1] about inter-node communication bottleneck,
>> >and I experienced same issue on inter-node messaging. I dig into the
>> >issue and it found that there is a lock on inter-node messaging[2] which
>> >causes bottleneck on sending messages to single node.
>>
>> How did you draw conclusion about the distribution lock queue being a
>> bottleneck?
>> Did you use the lock counting[1],  and if so what was the profiling stats?
>>
>>
>> /Sverker, Erlang/OTP
>>
>>
>> [1] http://www.erlang.org/doc/apps/tools/lcnt_chapter.html
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>