[erlang-questions] long sender side delays when sending to an external node

Mon Mar 1 11:19:04 CET 2010

Hi Scott,

On Fri, Feb 26, 2010 at 10:20 PM, Scott Lystig Fritchie <
fritchie@REDACTED> wrote:

> Robert Raschke <rtrlists@REDACTED> wrote:
>
> rr> I have a situation where {mbox, other_node} ! {self(), ok} is taking
> rr> a very long time (8-12 seconds!). That is, the sender is actually
> rr> blocked in the ! for that time.
>
> Robby, have you seen my post to the list earlier this month about
> 'busy_dist_port' system events?  If the Erlang port representing the TCP
> port used for inter-node communication becomes "busy", then a process
> (any process) attempting to send a message through that port will be
> unscheduled by the scheduler and won't be rescheduled until the busy
> port is itself unblocked.
>
> That's one way it could happen, at least.  There may be others.
>
>
Yup, I printed it off immediately. Thing is, in the case I'm seeing, there's
really just the pids and ok getting sent per message (with one second sleeps
in between). And unless the VM underneath my feet is sending/receiving lots
of stuff I don't know about, then I can't really see how the busy_dist_port
behaviour can kick in. But I'll try and have a deeper look, you never know.

I am currently trying to set myself up with an epmd using restricted ports,
so I can use wireshark to see where the traffic is actually going. Since
this is in the field, don't hold your breath.

My hunch is that a low-level socket send is blocking due to OS or security
interference. I also had a bunch of "patches" take out my app in a different
location, with a similar issue, but unfortunately not exactly the same (a
second machine was implicated as well). Removing the patches solved the
problem. Figuring out what and why is really rather hard.

Robby