<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Thu, Oct 13, 2016 at 4:34 PM Garry Hodgson <<a href="mailto:garry@research.att.com">garry@research.att.com</a>> wrote:<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">So how do other people handle this? Is there canonical erlangy way to do<br class="gmail_msg">
it? I reflexively avoid doing long running things in otp callbacks, but<br class="gmail_msg">
I expect there are better and worse places/times to do this.<br class="gmail_msg"><br class="gmail_msg"></blockquote><div><br></div><div>What happens if your app does its correct setup, however way you decide to do that; runs for 3 days and then, suddenly, your tunnels go down? Surely, your app better handle this situation in some way or the other, or the system will not give service in the the way it's supposed to do.</div><div><br></div><div>This tend to lead to observation #1: system startup is but a special case of running with degraded service. Answer how you want degraded service to run and this will go a long way to explain how your system should behave in its startup phase.</div><div><br></div><div>What you can tolerate in a degraded service mode depends on the application. In some circumstances, you can get away with closing down listening sockets for a while, in others you have ways to tell the other end that the connection it just got can't be served at the moment because there is a situation in your end. In some situations you can even continue giving service, but note to the other end certain replies are guesses because the real system is down at the moment.</div><div><br></div><div>Observation #2: Supervisor trees maintain invariants of your system.</div><div><br></div><div>What invariants you want to maintain is application dependent. But in my experience, it is often the case that some invariants are easier to maintain than others. A strong invariant such as "we have a working tunnel" is hard to maintain because it involves other distributed systems over which we have no control. A weaker invariant such as "either, there is no connection at the moment, we have tried to establish one for N milliseconds, or there is a connection" is easier to maintain in the supervision tree. One particular problem with a strong invariant is that it may give you a situation where your supervision tree is not fully constructed because it waits on a tunnel (which never happen).</div><div><br></div><div>Methods I've used with success:</div><div><br></div><div>* Have a gen_event manager to track the current state of the system. Use this to enable/disable service based on tunnel availability.</div><div>* Use gproc, for the same thing. This is what e.g., <a href="https://github.com/shopgun/turtle">https://github.com/shopgun/turtle</a> does (written by yours truly together with the other nice people working at Shopgun)</div><div>* Employ a circuit breaker and use its state to maintain the tunnel.</div><div>* Use ETS</div><div>* Use plain old messaging when state changes happen</div><div><br></div><div>The key point is that you will have to handle partially degraded service when things start going wrong anyway. So you need to maintain that in a robust fashion. Once you understand how to maintain that robustly, it is often going to guide a natural path for the startup situation. The same kind of message "tunnel X we depend on just went away" or "tunnel X we depend on just came back" is the natural state changes in the application which should be used.</div><div><br></div></div></div>