[erlang-questions] can nodes fail/recover too fast to be seen?
Fri Jul 5 21:33:45 CEST 2013
Thanks for the clarifications Per - that's cleared up a few things that I was unaware of.
On 5 Jul 2013, at 20:23, Per Hedeland wrote:
>>> On Jul 5, 2013, at 5:22 PM, Tim Watson wrote:
>>>> As i understand it, this can and does happen, because erlang does automatic reconnect in order to provide reliable communications.
So is the Svensson and Frelund paper (viz  from my earlier post) incorrect in its assertion that messages between nodes can be dropped in the face of rapid node reconnects?
>>>>> In Erlang, is it possible for a monitored node to fail and recover so quickly that nodes monitoring it won't detect the failure?
> No. The TCP connection to the old node instance cannot be used for
> communication with the new node instance, i.e. there is no way that
> communication with the new node instance can be established without the
> local VM generating node_down/'DOWN'/exit messages for the old instance.
Just out of interest, is this enforced by epmd or internally? Also, it would be worth making this explicit in the documentation somewhere, since this question comes up frequently.
>>>>> Or, is there some kind of internal persistent state that prevents this?
> This is where it potentially gets interesting - i.e. assuming *no*
> monitoring or linking - and that's where the "creation" part of a node
> identifier comes into play. If a distributed node restarts, it will get
> a new "creation" value courtesy of epmd, and any any pid() values
> referring to the old node instance will be invalid.
Does this depend on epmd having stayed up and running the whole time, or does epmd now have some local persistent state?
More information about the erlang-questions