[erlang-questions] can nodes fail/recover too fast to be seen?
Tim Watson
watson.timothy@REDACTED
Fri Jul 5 21:33:45 CEST 2013
Thanks for the clarifications Per - that's cleared up a few things that I was unaware of.
On 5 Jul 2013, at 20:23, Per Hedeland wrote:
>>> On Jul 5, 2013, at 5:22 PM, Tim Watson wrote:
>>>
>>>> As i understand it, this can and does happen, because erlang does automatic reconnect in order to provide reliable communications.
>
> No.
So is the Svensson and Frelund paper (viz [2] from my earlier post) incorrect in its assertion that messages between nodes can be dropped in the face of rapid node reconnects?
>>>>> In Erlang, is it possible for a monitored node to fail and recover so quickly that nodes monitoring it won't detect the failure?
>
> No. The TCP connection to the old node instance cannot be used for
> communication with the new node instance, i.e. there is no way that
> communication with the new node instance can be established without the
> local VM generating node_down/'DOWN'/exit messages for the old instance.
>
Just out of interest, is this enforced by epmd or internally? Also, it would be worth making this explicit in the documentation somewhere, since this question comes up frequently.
>>>>> Or, is there some kind of internal persistent state that prevents this?
>
> This is where it potentially gets interesting - i.e. assuming *no*
> monitoring or linking - and that's where the "creation" part of a node
> identifier comes into play. If a distributed node restarts, it will get
> a new "creation" value courtesy of epmd, and any any pid() values
> referring to the old node instance will be invalid.
>
Does this depend on epmd having stayed up and running the whole time, or does epmd now have some local persistent state?
Cheers,
Tim
More information about the erlang-questions
mailing list