Scott Lystig Fritchie
Tue Dec 18 07:02:49 CET 2012
Paul Davis <paul.joseph.davis@REDACTED> wrote:
pd> For background, I'm running R14B01 on three nodes with one of the
pd> three nodes in a remote data center that's about 40ms away from the
pd> other two which are <1ms apart.
Hrm, well, if the link truly is 1 Gbit (with ~8% fudge for TCP/IP
overhead), 2x the bandwidth delay product is about 79 Mbits or 9647
Kbytes. IIRC, kernel TCP settings to allow sliding windows at least
that big in order to utilize all that bandwidth by a single TCP
Ditto for the buffering inside the VM ... except that the +zdbbl flag to
"erl" wasn't added until well after R14B01's release. I don't have the
R14B01 release date handy, but R14B02 may have been released near March
2012? My patch for +zdbbl wasn't done for several more months:
Author: Scott Lystig Fritchie <slfritchie@REDACTED>
Date: Fri Oct 22 15:25:10 2010 -0500
Add flag-based setting for the distribution buffer busy limit
IIRC, if you hit that one (which was also fixed after R14B01?), all
distributed Erlang communication freezes. Attempting a new connection
via "erl [-name foo@REDACTED | -sname foo] -remsh frozen@REDACTED" won't work.
pd> What I'm observing is that the remote node ends up accumulating
pd> processes stuck in erlang:bif_return_trap/1 which eventually
pd> accumulate to the point where the node exhausts RAM and the node
pd> reboots (if I let it go that long). Each process stuck in
pd> bif_return_trap is related to distributed message passing.
Are you seeing busy_dist_port messages sent to the system monitor
process defined by erlang:system_monitor/2?
More information about the erlang-questions