[erlang-questions] Inter-node communication bottleneck
Jihyun Yu
yjh0502@REDACTED
Thu Aug 21 15:58:41 CEST 2014
Spinlock works well when contension is not heavy, but it starts to fail
when there are tens of processes are sending message to same node.
Here's some updates
- Tuning inet_default_connect_options and inet_default_listen_options
improves performance. I tested with following option for both, and
throughput increases from ~10MB/s to ~25MB/s
[{delay_send,true},{high_watermark,1024000}]
I also tried {nodelay, sndbuf, recbuf, buffer} options, but it seems
that these option does not affact on throuphout.
- Receiving side is also a bottleneck. I ran a benchmark with with
different message sizes, and found that receiving node cannot utilize
CPU more than 100% (one core). It seems that message decoding and
distribution is done in single thread, and it cannot handle lots of
small messages fast enough to satuate network bendwidth.
Several thoughts
- I didn't tested yet but ~25MB/s limitations is per node, so with 4
peer nodes I can satuate 1Gbps with smallest possible messages. The
performance might not that bad as I thought.
- I can simulate messaging with term_to_binary/1, binary_to_term/1, and
tcp socket with {packet, 2} or {packet, 4} option. It lacks some
optimization like atom table cache, but It scales.
I'll spend some time to fix the issue, but I'm not sure what I can do
more...
On Thu, Aug 21, 2014 at 01:15:32PM +0300, Dmitry Kolesnikov wrote:
> Hello,
>
> I am not sure why you are saying that spin-lock do not impact the result.
> I saw 25% gain on peak (on both loopback and en0 interfaces).
>
> - Dmitry
>
> On 21 Aug 2014, at 07:05, Jihyun Yu <yjh0502@REDACTED> wrote:
>
> > The problem is reproducable *with loopback interface*. I tested with two
> > Erlang instances on same machine, communicating via loopback interface,
> > and results are same.
> >
> > I changed contented lock - qlock in DistEntry - to spinlock, but it does
> > not affact on throughput(messages per second). 'perf' profiling tool [1]
> > shows that CPU cycle is moved from kernel to userspace.
> >
> > I attach patch and 10-seconds sampling result on mutex and spinlock.
> >
> > [1] https://perf.wiki.kernel.org/
> > <spinlock.patch><perf_spinlock.report><perf_mutex.report>_______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
>
More information about the erlang-questions
mailing list