[erlang-questions] Distributed apps when application terminates

Phillips, Christopher <>
Wed Sep 25 22:06:09 CEST 2013


  I think I figured it out. I tried it again from start; I believe I had started the app on the second node outside of the release (I.e., started the release binary with a clean target, and then manually started the app with application:start(app name) ), such that the kernel wasn't kicking in. I just repeated it from scratch, making sure I started the actual release target, and it appears to work fine.

From: <Phillips>, Chris Phillips <<mailto:>>
Date: Wednesday, September 25, 2013 3:32 PM
To: Yogish Baliga <<mailto:>>
Cc: "<mailto:>" <<mailto:>>
Subject: Re: [erlang-questions] Distributed apps when application terminates

  Yes, started but not loaded; I.e., running application:start(app name) again leads to an error, already started, but application:which_applications() does not list it. Same setup as I have when failover works by explicitly killing the initial node.

From: Yogish Baliga <<mailto:>>
Date: Wednesday, September 25, 2013 3:23 PM
To: Chris Phillips <<mailto:>>
Cc: Yogish Baliga <<mailto:>>, "<mailto:>" <<mailto:>>
Subject: Re: [erlang-questions] Distributed apps when application terminates

May be dumb question, but worth asking.

Is application started on the fail over node?

-- baliga


On Wed, Sep 25, 2013 at 12:11 PM, Phillips, Christopher <<mailto:>> wrote:
  The node appears to have gone down; if I'm attached to it I get dropped back to the shell. If I check for running Erlang processes (ps aux | grep beam) I see nothing. The other node received a 'nodedown' message. That's what's confusing me; if it was still up I'd understand, and in the past when just stopping the application manually I accepted it not failing over. This is a bit different.

From: Yogish Baliga <<mailto:>>
Date: Wednesday, September 25, 2013 2:57 PM
To: Chris Phillips <<mailto:>>
Cc: "<mailto:>" <<mailto:>>
Subject: Re: [erlang-questions] Distributed apps when application terminates

According to distributed app documentation:

If the node where the application is running goes down, the application is restarted (after the specified timeout) at the first node, specified by the distributed configuration parameter, which is up and running. This is called a failover.

In your case, your node did not go down but supervisor is stopped. I did a test in the past of the application fail over by disabling ethernet adapater on the master node.

-- baliga



On Wed, Sep 25, 2013 at 11:28 AM, Phillips, Christopher <<mailto:>> wrote:
I have a release built around a distributed application.

If I spin two nodes up, things are configured properly such that if I attach and q() out of the node the application is actively running on, failover occurs, the application starts up on the other node.

What I'm finding is that in the same situation, if I kill the top level supervisor (either by directly sending it an exit message, or having a child fail enough times to pass the max restart threshold), I _don't_ fail over. I do, however, receive a node down message on the other node. I'm wondering if this is intentional, a bug, or if I'm doing something wrong.

_______________________________________________
erlang-questions mailing list
<mailto:>
http://erlang.org/mailman/listinfo/erlang-questions



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130925/3b4d802b/attachment.html>


More information about the erlang-questions mailing list