[erlang-questions] Distributed apps when application terminates

Wed Sep 25 21:32:07 CEST 2013

  Yes, started but not loaded; I.e., running application:start(app name) again leads to an error, already started, but application:which_applications() does not list it. Same setup as I have when failover works by explicitly killing the initial node.

From: Yogish Baliga <yogishb@REDACTED<mailto:yogishb@REDACTED>>
Date: Wednesday, September 25, 2013 3:23 PM
To: Chris Phillips <christopher.phillips@REDACTED<mailto:christopher.phillips@REDACTED>>
Cc: Yogish Baliga <yogishb@REDACTED<mailto:yogishb@REDACTED>>, "erlang-questions@REDACTED<mailto:erlang-questions@REDACTED>" <erlang-questions@REDACTED<mailto:erlang-questions@REDACTED>>
Subject: Re: [erlang-questions] Distributed apps when application terminates

May be dumb question, but worth asking.

Is application started on the fail over node?

-- baliga

On Wed, Sep 25, 2013 at 12:11 PM, Phillips, Christopher <Christopher.Phillips@REDACTED<mailto:Christopher.Phillips@REDACTED>> wrote:
  The node appears to have gone down; if I'm attached to it I get dropped back to the shell. If I check for running Erlang processes (ps aux | grep beam) I see nothing. The other node received a 'nodedown' message. That's what's confusing me; if it was still up I'd understand, and in the past when just stopping the application manually I accepted it not failing over. This is a bit different.

From: Yogish Baliga <yogishb@REDACTED<mailto:yogishb@REDACTED>>
Date: Wednesday, September 25, 2013 2:57 PM
To: Chris Phillips <christopher.phillips@REDACTED<mailto:christopher.phillips@REDACTED>>
Cc: "erlang-questions@REDACTED<mailto:erlang-questions@REDACTED>" <erlang-questions@REDACTED<mailto:erlang-questions@REDACTED>>
Subject: Re: [erlang-questions] Distributed apps when application terminates

According to distributed app documentation:

If the node where the application is running goes down, the application is restarted (after the specified timeout) at the first node, specified by the distributed configuration parameter, which is up and running. This is called a failover.

In your case, your node did not go down but supervisor is stopped. I did a test in the past of the application fail over by disabling ethernet adapater on the master node.

-- baliga

On Wed, Sep 25, 2013 at 11:28 AM, Phillips, Christopher <Christopher.Phillips@REDACTED<mailto:Christopher.Phillips@REDACTED>> wrote:
I have a release built around a distributed application.

If I spin two nodes up, things are configured properly such that if I attach and q() out of the node the application is actively running on, failover occurs, the application starts up on the other node.

What I'm finding is that in the same situation, if I kill the top level supervisor (either by directly sending it an exit message, or having a child fail enough times to pass the max restart threshold), I _don't_ fail over. I do, however, receive a node down message on the other node. I'm wondering if this is intentional, a bug, or if I'm doing something wrong.

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED<mailto:erlang-questions@REDACTED>
http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130925/ececbcbb/attachment.htm>