[erlang-questions] erlang nodes over a wifi network

Ulf Wiger ulf@REDACTED
Sat May 19 17:09:28 CEST 2007


The sync_nodes_timeout setting is only relevant to the
distributed application controller, and tells it how long to wait
for the other node to appear before proceeding as if they were
dead.

The parameter that ought to be of interest in this case is
-kernel net_ticktime

It is by default set to 60 seconds, which means that the nodes
will send a tick if they haven't sent anything else to the other
node in 60/4 seconds. If nothing has been received from the
other node after 4 tick intervals, the connection is considered dead.
Since the other node is expected to sent ticks at the same interval
(that is, net_ticktime must be the same on both sides), something
should always be received.

The heartbeat code is brilliant. It's very concise and obviously flawless,
but I'm convinced that it has a hole in it. (:  It's just that we've
experienced
nodedowns during extreme situations where our other - less elegant -
heartbeat algorithms have been able to recover.

One situation that was weird enough to remember was when we had
reason to learn about the tcp rexmit settings. We found that with an
aggressive net_ticktime (10 seconds, which really shouldn't be
considered that aggressive), we had to set tcp.rexmit_max so that
the TCP retransmission logic didn't wait to long before resending a
packet. Not only that, we had to set the rexmit_init value (this was
Solaris 8, and I don't recall the exact syntax) to be low - otherwise
the _first_ retransmission wait would trigger the nodedown.

You would perhaps run Wireshark to try to find out what's actually
going on when the nodedown happens. Again, I have a feeling that
the erlang tick algorithm is sometimes a bit easily offended, but
we've not been able to pin down the exact circumstances, nor find
an actual flaw in the algorithm.

BR,
Ulf W

2007/5/19, Enrique Marcote <enrique.marcote@REDACTED>:
>
> Hi all,
>
> I'm trying to connect several  erlang nodes over a wifi network. The
> same application works  perfectly over ethernet but when I go
> wireless, approximately every 5  minutes I get a noconnection error in
> some of the nodes (nodes  reconnect by themselves after a period of
> time that goes from 5 secs to  1min).
>
> I'm setting the following kernel parameters  in the sys.config:
>
> [{kernel, [{sync_nodes_mandatory, []}, {sync_nodes_optional,
> ['sun@REDACTED']}, {sync_nodes_timeout, 5000}]},
>
> Nodes communicate with each other  issuing rpc calls.
>
> Are there any recommendations you could  point out in order to connect
> several erlang nodes over a medium quality  wifi network? (quality is
> in average 50%). The network quality is  not perfect but seems good
> enough for other applications (ssh, http...).
>
> Any help would be greatly appreciated.  Thanks in advance.
>
> Quique
>
> ---------- Forwarded message ----------
> From: Enrique Marcote <enrique.marcote@REDACTED>
> Date: Sat, 19 May 2007 13:42:43 +0200
> Subject: erlang nodes over a wifi network
> To: erlang-questions@REDACTED
>
> Hi all,
>
> I'm doing some tests connecting several erlang nodes over a wifi
> network.  The same application works perfectly over ethernet but when
> I go wireless, approximately every 5 minutes I get a noconnection
> error in some of the nodes (nodes reconnect by themselves after a
> period of time that goes from 5 secs to 1min).
>
> I'm setting the following kernel parameters in the sys.config:
>
> [{kernel, [{sync_nodes_mandatory, []},
>            {sync_nodes_optional, ['sun@REDACTED']},
>            {sync_nodes_timeout, 5000}]},
>
> Nodes communicate with each other issuing rpc calls.
>
> Are there any recommendations you could point out in order to connect
> several erlang nodes over a medium quality wifi network? (quality is
> in average 50%).  The network quality is not perfect but seems good
> enough for other applications (ssh, http...).
>
> Any help would be greatly appreciated.  Thanks in advance.
>
> Quique
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070519/a24597db/attachment.htm>


More information about the erlang-questions mailing list