[erlang-questions] Erlang message passing delay after abnormal network disconnection

Kenneth Lundin kenneth.lundin@REDACTED
Tue Mar 4 17:58:44 CET 2008


When connectivity is broken abnormally the sending node will detect
this within 45-60 seconds as default. This can be changed with the
net_tick_time environment variable in application kernel.
Before the detection the sending node will try to send the message and
if not possible it will be queued in the inet-driver. If the queue
gets bigger than a certain max a so called "busy port" will occur
which will block the sending Erlang process.
This occurs when the receiving side of the distribution socket does
not read what is
sent to it which is the case when you have no connectivity.

another scenario is that the receiving node is detected as down and
an auto connect (including handshake) is performed for the first
message sent after
the broken connection. This will take in the order of 10 seconds before timeout.

If you want to avoid this for a very crucial process (i.e avoid
blocking of that particular Erlang process) you can send the message
with erlang:send_nosuspend/2 or 3. Warning! these functions should be
used with extreme care, Read the manual!

Note that this has nothing to do with HiPE (i.e native code).
An abnormal termination of the connectivity for example by unplugging
the network cable will have this effect.

/Kenneth Erlang/OTP team Ericsson

On 3/4/08, Eranga Udesh <eranga.erl@REDACTED> wrote:
> The problem occurs when the network connectivity is broken (abnormally). The
> receiving node is not receiving messages. The sending  processes are
> blocked, since those message delivery calls (gen_event:notify/s, etc) are
> waiting for about 10 secs to return. We checked the implementation of such
> calls and notice, the functions are waiting until the messages are delivered
> to the receiving node. Is there's a way (a system flag may be) to avoid such
> blocking and to return immediately?
>
> BRgds,
> - Eranga
>
>
>
>
> On Mon, Mar 3, 2008 at 6:51 PM, Chandru
> <chandrashekhar.mullaparthi@REDACTED> wrote:
> >
> >
> >
> > On 03/03/2008, Eranga Udesh <eranga.erl@REDACTED> wrote:
> > > Hi,
> > >
> > > I am experiencing a high message passing delay between 2 Erlang nodes,
> after
> > > an abnormal network disconnection. Those 2 nodes are in a WAN and there
> are
> > > multiple Hubs, Switches, Routes, etc., in between them. If the message
> > > receiving Erlang node stopped gracefully, the delay doesn't arise. Doing
> > > net_adm:ping/1 to that node results no delay "pang". However
> > > gen_event:notify/2, gen_server:cast/2, etc. are waiting for about 10
> seconds
> > > to return.
> > >
> > > What's the issue and how this can be avoided?
> >
> > Have you tried putting a snoop to see whether the delay is on the
> > sending/receiving side?
> >
> > This might be useful:
> http://www.erlang.org/contrib/erlsnoop-1.0.tgz
> >
> > cheers
> > Chandru
> >
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list