distributed Erlang connect fails

Wed Jun 1 11:08:15 CEST 2005

On Tuesday 31 May 2005 16.47, Gerd Flaig wrote:
> Fredrik Thulin <ft@REDACTED> writes:
> > I'm writing a command-line control tool for my application. I get
> > into problems if I execute my control-tool rapidly (like pressing
> > up-arrow and then enter in the UNIX shell).
>
> you could try to assign a unique name to each control tool instance,
> like in
>
> $ erl -name control$$ -hidden -remsh incomingproxy@`hostname -f`

Yes, sure. I think this is a bigger problem though. I have seen this 
problem many times before, when I stop one of my nodes and want to 
restart it immediately.

I've given the question about what really is the problem some more 
thought, and I think the problem is that the node that continues 
running is not made aware of the other node stopping. 

Sometimes, this seems to go fairly quick (so restart works), but 
sometimes it seems to take the full 75 seconds (comment in 
dist_util.erl : "The detection time interval is thus, by default, 45s < 
DT < 75s") of not receiving an answer to ticks sent to the other node 
before the running node discovers that the other node is gone.

With a seven second timeout for the node starting up, this obviously has 
a (great) possibility to fail.

/Fredrik