distributed Erlang connect fails

Fredrik Thulin ft@REDACTED
Mon May 30 15:39:29 CEST 2005


I'm writing a command-line control tool for my application. I get into 
problems if I execute my control-tool rapidly (like pressing up-arrow 
and then enter in the UNIX shell). 

I've traced this down to the dist_util module, where the running node 
does not know that the control-tool-node has terminated, so when the 
control-tool-node connects the second time, the running node enters 
wait_pending/1, but most oftenly never returns from there because the 
timer fires first (the "Connection setup timeout timer" timer, started 
by dist_util:start_timer/1). 

What is the problem here?

  a) that the running node is not notified when the control-tool-node 
     exits - should I do something for this to happen? I've tried
     erlang:disconnect_node(RunningNode) before terminating, as well
     as stopping the control-tool-node through init:stop() instead of
     erlang:halt/1. Doesn't help.

  b) the problem is that the timer fires. When I first enabled tracing
     by defining dist_trace and dist_debug in dist_util.hrl it started
     working every time (although it could take quite a few seconds for 
     the nodes to connect). This turned out to be because when     
     dist_trace is defined, the timer's timeout is multiplied by eight.

Running R10B-5 and -proto_dist inet_ssl on Linux 2.4.


More information about the erlang-questions mailing list