[erlang-questions] Automatically reconnecting nodes when they come back online

Joseph Wayne Norton norton@REDACTED
Fri Apr 26 19:13:11 CEST 2013


I don't have a direct answer to your question.

However, are you aware of the slave module?

Some of the recipe(s) in this module might be of use to you.

https://github.com/norton/qc/blob/master/src/qc_slave.erl

On 2013/04/27, at 2:00, Scott Thoman <scott@REDACTED> wrote:

> To all who know more about this than I do:
> 
> First, I'm just beginning to learn about Erlang/OTP so I figured I'd
> use to implement something useful.
> 
> Part of what I'd like to build will involve a "conductor" controller
> node that directs some other "player" nodes to all do something at
> approximately the same time - ultimately to actually test the
> operation of another piece of distributed software.  As part of those
> operations, I expect the player nodes may sometimes crash (actually
> cause a Windows BSOD in some cases) and then eventually come back to
> life.
> 
> What I'm wondering about is what some folks have found to be good ways
> of getting nodes to rejoin the cluster when they come back to life.
> They way I'm thinking about it now, is that the player nodes will be
> passive in the sense that they won't actively connect to any other
> nodes - they'll only get connected when the conductor node invites
> them in.  I'm also not looking for fault tolerance on the conductor
> node at this point; if that one fails badly I'll just get some coffee
> and rerun the scenario again.
> 
> My first two thoughts were:
> 1.  When the conductor node connects up the player nodes it would also
> spawn a process whose sole job is to periodically ping the other nodes
> to ensure they're connected.  Then when one goes down, those pings
> will just fail during that time but when the node comes back a ping
> will reconnect it to the other nodes.  All this time, I'd be
> monitoring the node up/down messages.
> 2.  I'd start by monitoring all the nodes as the conductor connects
> them and when receiving a node down message, spawn a process whose job
> it is to periodically ping only that node only until it comes back.
> 
> Are there some good practices out there for systems that want to
> behave like this?
> 
> Thanks in advance,
> 
> /stt
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130427/029d3845/attachment.htm>


More information about the erlang-questions mailing list