[erlang-questions] Intermittent failures *reconnecting* C hidden nodes

David Hopwood david.hopwood@REDACTED
Mon Jul 9 18:11:00 CEST 2007


Andy Sloane wrote:-
> Now... In my reading of the code, the only way the 'nok' can be sent
> is if handle_info({...,{accept_pending,...}},...) in net_kernel.erl
> returns 'nok_pending' to mark_pending/1 in dist_util.erl, like so:
> 
> handle_info({AcceptPid, {accept_pending,MyNode,Node,Address,Type}}, State) ->
>     case ets:lookup(sys_dist, Node) of
>         [#connection{state=pending}=Conn] ->
>             if
>                 MyNode > Node ->
>                     AcceptPid ! {self(),{accept_pending,nok_pending}},
>                     {noreply,State};
>                 true ->
>                     [...snip]
> 
> If I'm reading that right, it's doing a *lexical comparison* of the
> local node name atom with the connecting one.  I cannot fathom why you
> would want to do that...

It's quite a common technique in distribution protocols to impose a
priority order on nodes, so that when a given pair of nodes communicate,
one consistently has the higher priority, and the other the lower. Here
it seems that node names are being used for this ordering.

Perhaps when the node comes back up with a different name, some
assumption made by the distribution protocol is being violated. This
is just speculation, though; I don't know the protocol in detail.

-- 
David Hopwood <david.hopwood@REDACTED>




More information about the erlang-questions mailing list