[erlang-questions] can nodes fail/recover too fast to be seen?

Sun Jul 7 15:36:10 CEST 2013

Tim Watson <watson.timothy@REDACTED> wrote:
>
>On 5 Jul 2013, at 21:07, Per Hedeland <per@REDACTED> wrote:
>
>> TCP enforces "cannot be used for ...". The
>> VM/net_kernel will not make a new connection until it has decided that
>> the old one isn't working any more, and at that point it will generate
>> the node_down/'DOWN'/exit messages.
>
>So given that intermediate kit could be proxying the connection, it is /possible/ that the node was reset but the network layer didn't notice? I mention that because if it is the case, people should bear it in mind when designing their infrastructure.

I'm actually not sure what you're talking about here - AFAIK it's not
common or even easily doable to "proxy" Erlang distribution connections.
But if you somehow manage to do it for some reason, you need to make
sure that the resulting connection has the same end-to-end semantics as
TCP at least in the reliability department. I.e. if your proxy "hides"
the fact that a remote node restarted, it is basically broken.

--Per