[erlang-questions] net_kernel:start/1 and net_kernel:connect_node/1 errors

Sun Mar 6 10:28:25 CET 2016

Hi,

I'm having some problems with distributed erlang on Linux.

I have two ubuntu hosts, and on each node the ERL_EPMD_PORT environment 
variable is set to 5779.

Hostnames are properly set in /etc/hosts on both hosts.  Erlang is 
started on the two hosts by

$ erl -sname er1@REDACTED

(er1@REDACTED)1> node().
er1@REDACTED

$ erl -sname er2@REDACTED

(er2@REDACTED)1> node().
er2@REDACTED

and epmd is running on both hosts listening on 5779.

Question 1:  Why does connecting to er2@REDACTED using 
net_kernel:connect_node/1 fail?

(er1@REDACTED)4> net_kernel:connect_node(er2@REDACTED).
false

It runs for a couple seconds and returns false.

Is it because epmd is not listening on the default port?

Question 2: If epmd is not running, when I run 
net_kernel:start([er1@REDACTED, shortnames]), it returns error.  Why? If 
it's caused by epmd not running, how to make sure that epmd is started 
when you start an OTP release?

1> net_kernel:start([er1@REDACTED, shortnames]).
{error,
     {{shutdown,
{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
      {child,undefined,net_sup_dynamic,
          {erl_distribution,start_link,[[er1@REDACTED,shortnames]]},
          permanent,1000,supervisor,
          [erl_distribution]}}}

Question 3: After starting erl with -sname, if you run netstat, you'll 
see a process listening on a random port, what is it?

tcp        0      0 0.0.0.0:23333           0.0.0.0:* LISTEN      
12650/beam.smp

Question 4: If an OTP release is meant to run on multiple nodes, how to 
properly set node name and start distribution on each node 
automatically?  I want to start running the release on all nodes by 
running one command on one of the nodes.

Currently I'm planning to do it this way:

1) hardcode all participating hostnames in sys.conf, something like 
[{"er1", "host1"}, {"er2", "host2"}]
2) on each host, when the erlang application starts, it reads the config 
from sys.conf, finds its own name and use net_kernel:start/1 to set the 
node name and start distribution

What is the best practice to do it?

Thanks
Khitai