[erlang-questions] Internode connections

Serge Aleynikov serge@REDACTED
Tue Mar 13 14:16:33 CET 2007


t ty wrote:
> If two nodes are interconnected (net_adm:ping/1 succeeds) and the
> network connection between both nodes go down, what is the longest
> time period which can pass before each node says the other is dead.
> Hints to code to look at would be fine.

Somewhere between 45 to 75 seconds.  Look into the docs on the 
net_ticktime net_kernel's option:

http://www.erlang.org/doc/doc-5.5.3/lib/kernel-2.11.3/doc/html/kernel_app.html

> Reason: I had two interconnected nodes lose connections to each other
> over the weekend. The servers are the same LAN but different subnet.
> The servers were running the entire time, however over this weekend we
> had a Daylight Saving Time change. I suspect one server might had the
> time change occur before the other and the time to live got all weird.

If you are not starting the emulator with the +c switch, it normally 
compensates for sudden changes in the system time by gradually adjusting 
time to catch up with the wall clock.  I don't know for sure if 
net_ticktime is driven by a timer that's based on the internal clock 
values (i.e. erlang:now/0) rather then the wall clock, but have no 
reason to believe it wouldn't be.

> End result, mnesia dropped its peer and I'm left with an inconsistant
> dbase.

If you are running a replicated database, you need to design the system 
that it would monitor network partitioning, and automatically restart 
the replicated nodes.  Please read this thread:

http://www.erlang.org/ml-archive/erlang-questions/200304/msg00418.html

we also use this technique and it is indeed quite effective.  I wish 
that mnesia had this feature built-in.

Regards,

Serge

> Thanks
> 
> t
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 



More information about the erlang-questions mailing list