[erlang-questions] High lock contention on dist_tables

Tue Apr 23 21:01:27 CEST 2013

Brian Picciano <mediocregopher@REDACTED> wrote:

bp> We have a pool of 3 erlang nodes, all on different servers. Every
bp> afternoon, without fail, we start seeing lots of messages between
bp> the nodes start having really high latency, on the order of tens of
bp> seconds. [...]

Brian, it's probably worthwhile to continue chasing the 'lcnt' avenue
as you've been corresponding with Lukas...

... but at the same time, I also wonder about "tens of seconds".  My gut
says that such delays would require some amazingly high lock contention
rates.  Something that can cause such messaging delays much more easily
is network congestion/packet loss that triggers TCP slow start.  Many
Linux kernels have the RTO_min value at one second, which is the amount
of time to wait before entering slow start state.

If network packet loss is a problem, this blog posting can explain one
reason why it's happening:
http://www.snookles.com/slf-blog/2012/01/05/tcp-incast-what-is-it/

-Scott