[erlang-questions] Automatically reconnecting nodes when they come back online

Scott Thoman <>
Fri Apr 26 19:35:47 CEST 2013


On Fri, Apr 26, 2013 at 1:13 PM, Joseph Wayne Norton
<> wrote:
> I don't have a direct answer to your question.
>
> However, are you aware of the slave module?
>
> Some of the recipe(s) in this module might be of use to you.
>
> https://github.com/norton/qc/blob/master/src/qc_slave.erl
>
> On 2013/04/27, at 2:00, Scott Thoman <> wrote:
>
> To all who know more about this than I do:
>
> First, I'm just beginning to learn about Erlang/OTP so I figured I'd
> use to implement something useful.
>
> Part of what I'd like to build will involve a "conductor" controller
> node that directs some other "player" nodes to all do something at
> approximately the same time - ultimately to actually test the
> operation of another piece of distributed software.  As part of those
> operations, I expect the player nodes may sometimes crash (actually
> cause a Windows BSOD in some cases) and then eventually come back to
> life.
>
> What I'm wondering about is what some folks have found to be good ways
> of getting nodes to rejoin the cluster when they come back to life.
> They way I'm thinking about it now, is that the player nodes will be
> passive in the sense that they won't actively connect to any other
> nodes - they'll only get connected when the conductor node invites
> them in.  I'm also not looking for fault tolerance on the conductor
> node at this point; if that one fails badly I'll just get some coffee
> and rerun the scenario again.
>
> My first two thoughts were:
> 1.  When the conductor node connects up the player nodes it would also
> spawn a process whose sole job is to periodically ping the other nodes
> to ensure they're connected.  Then when one goes down, those pings
> will just fail during that time but when the node comes back a ping
> will reconnect it to the other nodes.  All this time, I'd be
> monitoring the node up/down messages.
> 2.  I'd start by monitoring all the nodes as the conductor connects
> them and when receiving a node down message, spawn a process whose job
> it is to periodically ping only that node only until it comes back.
>
> Are there some good practices out there for systems that want to
> behave like this?
>
> Thanks in advance,
>
> /stt
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions

I'm not aware of it yet but I'll take a look...

Thanks,
/stt



More information about the erlang-questions mailing list