Takeover troubles

steve steve.e.123@REDACTED
Wed Jan 13 00:03:33 CET 2010


Hi Group,

I have written a little test distributed app based on the Design Principles
of three nodes (n1, n2, and n3). Failover and takeover are working fine
except in the following test:

n1 and n2 go down. n3 has control. n1 comes back up while n2 is down.

When n1 comes back up with n2 down, n1 won't start and take over from n3.
Instead I get this report on n1 (and my app on n3 is stopped, although is
still loaded and configured):

=SUPERVISOR REPORT==== 12-Jan-2010::17:48:32 ===
     Supervisor: {local,dist_sup}
     Context:    start_error
     Reason:     {already_started,<2557.84.0>}
     Offender:   [{pid,undefined},
                  {name,dist},
                  {mfa,{dist,start_link,[]}},
                  {restart_type,permanent},
                  {shutdown,10000},
                  {child_type,worker}]

However, if I start n2 before I bring n1 up, n1 takes over as it should from
n3.

I'm on R13B02.

My config file looks like this:

[{kernel, [{sync_nodes_optional, [n1@REDACTED, n2@REDACTED, n3@REDACTED
]},
    {sync_nodes_timeout, 10000},
  {distributed, [{dist, [n1@REDACTED, {n2@REDACTED, n3@REDACTED}]}]}]}].

My app file:

{application, dist,
 [{description, "dist test"},
  {vsn, "1.0"},
  {modules, [dist_app,
             dist_sup,
             dist]},
  {registered, [dist_app, dist_sup, dist]},
  {included_applications, []},
  {applications, [kernel, stdlib, sasl]},
  {mod, {dist_app,[]}},
  {start_phases, [{go, []}]}
 ]}.

TIA,
Steve


More information about the erlang-questions mailing list