[erlang-questions] TCP buffering

Fri Dec 5 09:22:58 CET 2014

2014-12-04 22:31 GMT+01:00 sean mcevoy <sean.mcevoy@REDACTED>:

> Hi List,
>
> I need some TCP help and advice on how to manage buffer sizes from the
> gen_tcp api.
>
> We have a system made up of 4 basic node types, lets call the A, B, C & DB
> (all running R15B), each of which can have multiple instances. We also have
> a communications protocol that runs over tcp links between the different
> node types and works fine on the connections between A & B and B & C, but
> on the connections between B & DB we've been getting some strange behaviour.
>
> DB is a node that basically just runs mnesia and is the data store for the
> system, if that's relevant, and connections to it also work fine for a few
> days after it restarts. But after a few days we seem to get "chokes" in the
> TCP communications at very regular 7 minute intervals. The rest of the VM
> stays working but messages in the TCP link take up to 8 seconds to reach
> their destination, causing timeouts on the higher level protocol.
> These "chokes" are regular across peak & quiet times and cause a similar
> proportion of timeouts regardless of the traffic level. (Traffic comprises
> of simple non-blocking requests and responses)
>
> I've been investigating and have become focussed on the tcp buffer sizing,
> though I've no concrete evidence that this is actually the problem and my
> TCP knowledge before this investigation was more or less restricted to
> what's exposed through gen_tcp. So please advise if you think there may be
> another source.
>

Since you are running mnesia on the node, I would look for correlation
between mnesia table dumps and the chokes. It might not be the culprit of
the problem, but it might be the trigger. Seven minutes sounds about right
for mnesia dumps, and depending on the data in your disc_copies table, the
dump can cause pretty bad behavior on schedulers and io, affecting
seemingly unrelated processes. I've seen nodes behaving very badly because
of this.

If you find the correlation to mnesia dumps, you could try setting the
scheduler wake up threshold (+swt) to low or very_low and the scheduler
forced wake up interval (+sfwi) to some nice number (1000 ms has worked for
me), to make sure you are not starving the processes receiving the tcp
communication.

I can't comment on the tcp buffers, but at least you have someting else to
look for as well.

> What I've found is that on initial connection both sndbuf & recbuf are set
> to 10MB, and after a few days when we see these problems TCP has resized
> them down to 49KB. On the other links where there are no problems the
> buffers still have their original sizes. But for some reason inet:setopts
> won't resize these 49KB buffers in the live site the way it will in my test
> environment.
>
> And just now I've discovered the separate buffer parameter that I didn't
> know about before, from the OTP docs this one should be larger than the
> larger of sndbuf & recbuf but on my problematic link I have these values:
> [{buffer,1460},{sndbuf,49152},{recbuf,49640}].
> In my "good" links this is set to 10MB, just like sndbuf & recbuf, even
> though we didn't explicitly set it.
>
> So my questions are:
> - What governs this TCP resizing, I know it's in the protocol but what
> traffic patterns might cause this?
> - How can I resize my buffers once I'm in this state?
> - Are the buffer sizes the likely cause of the "chokes" I'm observing?
>
> Thanks in advance!
> //Sean.
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141205/47b24108/attachment.htm>