<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>I don't have a direct answer to your question.</div><div><br></div><div>However, are you aware of the slave module?</div><div><br></div><div>Some of the recipe(s) in this module might be of use to you.</div><div><br></div><div><span style="font-family: '.HelveticaNeueUI'; font-size: 15px; line-height: 19px; white-space: nowrap; -webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); -webkit-text-size-adjust: none; "><a href="https://github.com/norton/qc/blob/master/src/qc_slave.erl">https://github.com/norton/qc/blob/master/src/qc_slave.erl</a></span></div><div><br>On 2013/04/27, at 2:00, Scott Thoman <<a href="mailto:scott@thoman.org">scott@thoman.org</a>> wrote:<br><br></div><blockquote type="cite"><div><span>To all who know more about this than I do:</span><br><span></span><br><span>First, I'm just beginning to learn about Erlang/OTP so I figured I'd</span><br><span>use to implement something useful.</span><br><span></span><br><span>Part of what I'd like to build will involve a "conductor" controller</span><br><span>node that directs some other "player" nodes to all do something at</span><br><span>approximately the same time - ultimately to actually test the</span><br><span>operation of another piece of distributed software.  As part of those</span><br><span>operations, I expect the player nodes may sometimes crash (actually</span><br><span>cause a Windows BSOD in some cases) and then eventually come back to</span><br><span>life.</span><br><span></span><br><span>What I'm wondering about is what some folks have found to be good ways</span><br><span>of getting nodes to rejoin the cluster when they come back to life.</span><br><span>They way I'm thinking about it now, is that the player nodes will be</span><br><span>passive in the sense that they won't actively connect to any other</span><br><span>nodes - they'll only get connected when the conductor node invites</span><br><span>them in.  I'm also not looking for fault tolerance on the conductor</span><br><span>node at this point; if that one fails badly I'll just get some coffee</span><br><span>and rerun the scenario again.</span><br><span></span><br><span>My first two thoughts were:</span><br><span>1.  When the conductor node connects up the player nodes it would also</span><br><span>spawn a process whose sole job is to periodically ping the other nodes</span><br><span>to ensure they're connected.  Then when one goes down, those pings</span><br><span>will just fail during that time but when the node comes back a ping</span><br><span>will reconnect it to the other nodes.  All this time, I'd be</span><br><span>monitoring the node up/down messages.</span><br><span>2.  I'd start by monitoring all the nodes as the conductor connects</span><br><span>them and when receiving a node down message, spawn a process whose job</span><br><span>it is to periodically ping only that node only until it comes back.</span><br><span></span><br><span>Are there some good practices out there for systems that want to</span><br><span>behave like this?</span><br><span></span><br><span>Thanks in advance,</span><br><span></span><br><span>/stt</span><br><span>_______________________________________________</span><br><span>erlang-questions mailing list</span><br><span><a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a></span><br><span><a href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a></span><br></div></blockquote></body></html>