[erlang-questions] Inter-node communication bottleneck

Wed Aug 20 18:01:35 CEST 2014

First, I assume that elang VM itself is scalable because there is no
problem when I ran benchmark on single Erlang instance. During
benchmark, Erlang instance couldn't saturate CPU while it had enough
number of threads and erlang processes. Bandwidth also couldn't be a
problem because the result can be reproducible on single machine with
loopback interface, which is not a bottleneck at least on ~100Mbps
throughput.

So I thought that lock is a problem. I profiled erlang VM with
systemtap, collect userspace stacktraces when lock collides. It showed
two frequently colliding locks: one is from schedular_wait(), another
one is from erts_dsig_send_msg(). schedular_wait() couldn't be a problem
because benchmark works as expected without inter-node communication.

Here's example stacktrace I got during experiment.

__lll_lock_wait+0x1d/0x30 [/usr/lib64/libpthread-2.17.so]
_L_lock_790+0xf/0x1b [/usr/lib64/libpthread-2.17.so]
__pthread_mutex_lock+0x37/0x122 [/usr/lib64/libpthread-2.17.so]
erts_dsig_send_msg+0x3c5/0x640 [erts-6.1/bin/beam.smp]
remote_send+0xc7/0x220 [erts-6.1/bin/beam.smp]
erl_send+0x581/0xa60 [erts-6.1/bin/beam.smp]
process_main+0x80f2/0xba80 [erts-6.1/bin/beam.smp]
sched_thread_func+0xe2/0x1d0 [erts-6.1/bin/beam.smp]
thr_wrapper+0x65/0xb0 [erts-6.1/bin/beam.smp]
start_thread+0xc3/0x310 [/usr/lib64/libpthread-2.17.so]
clone+0x6d/0x90 [/usr/lib64/libc-2.17.so]

The experiment is done on 2-cpu 12-core HP machine with CentOS 7.
I attached systemtap script and a result.

On Wed, Aug 20, 2014 at 04:29:19PM +0200, Sverker Eriksson wrote:
> On 08/19/2014 02:59 PM, Jihyun Yu wrote:
> >Hi,
> >
> >There is a prior discusson[1] about inter-node communication bottleneck,
> >and I experienced same issue on inter-node messaging. I dig into the
> >issue and it found that there is a lock on inter-node messaging[2] which
> >causes bottleneck on sending messages to single node.
> 
> How did you draw conclusion about the distribution lock queue being a
> bottleneck?
> Did you use the lock counting[1],  and if so what was the profiling stats?
> 
> 
> /Sverker, Erlang/OTP
> 
> 
> [1] http://www.erlang.org/doc/apps/tools/lcnt_chapter.html
> 
-------------- next part --------------
#! /usr/bin/env stap
# This script tries to identify contended user-space locks by hooking
# into the futex system call.
global FUTEX_WAIT = 0 /*, FUTEX_WAKE = 1 */
global FUTEX_PRIVATE_FLAG = 128 /* linux 2.6.22+ */
global FUTEX_CLOCK_REALTIME = 256 /* linux 2.6.29+ */
global lock_waits # long-lived stats on (tid,lock) blockage elapsed time

probe syscall.futex.return {
    if (($op & ~(FUTEX_PRIVATE_FLAG|FUTEX_CLOCK_REALTIME)) != FUTEX_WAIT) next
    if (pid() != target()) next

    elapsed = gettimeofday_us() - @entry(gettimeofday_us())
    lock_waits[tid(), $uaddr] <<< elapsed

    print_usyms(ubacktrace());
    printf("\n");
}

probe end {
    foreach ([tid+, lock] in lock_waits)
        printf ("[%d] lock %p contended %d times, %d avg us\n",
                tid, lock, @count(lock_waits[tid,lock]),
                @avg(lock_waits[tid,lock]))
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: samples.xz
Type: application/octet-stream
Size: 6380 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140821/106765d8/attachment.obj>