[erlang-questions] establishing preconditions at startup

Thu Oct 13 22:21:30 CEST 2016

On Thu, Oct 13, 2016 at 4:34 PM Garry Hodgson <garry@REDACTED>
wrote:

> So how do other people handle this? Is there canonical erlangy way to do
> it? I reflexively avoid doing long running things in otp callbacks, but
> I expect there are better and worse places/times to do this.
>
>
What happens if your app does its correct setup, however way you decide to
do that; runs for 3 days and then, suddenly, your tunnels go down? Surely,
your app better handle this situation in some way or the other, or the
system will not give service in the the way it's supposed to do.

This tend to lead to observation #1: system startup is but a special case
of running with degraded service. Answer how you want degraded service to
run and this will go a long way to explain how your system should behave in
its startup phase.

What you can tolerate in a degraded service mode depends on the
application. In some circumstances, you can get away with closing down
listening sockets for a while, in others you have ways to tell the other
end that the connection it just got can't be served at the moment because
there is a situation in your end. In some situations you can even continue
giving service, but note to the other end certain replies are guesses
because the real system is down at the moment.

Observation #2: Supervisor trees maintain invariants of your system.

What invariants you want to maintain is application dependent. But in my
experience, it is often the case that some invariants are easier to
maintain than others. A strong invariant such as "we have a working tunnel"
is hard to maintain because it involves other distributed systems over
which we have no control. A weaker invariant such as "either, there is no
connection at the moment, we have tried to establish one for N
milliseconds, or there is a connection" is easier to maintain in the
supervision tree. One particular problem with a strong invariant is that it
may give you a situation where your supervision tree is not fully
constructed because it waits on a tunnel (which never happen).

Methods I've used with success:

* Have a gen_event manager to track the current state of the system. Use
this to enable/disable service based on tunnel availability.
* Use gproc, for the same thing. This is what e.g.,
https://github.com/shopgun/turtle does (written by yours truly together
with the other nice people working at Shopgun)
* Employ a circuit breaker and use its state to maintain the tunnel.
* Use ETS
* Use plain old messaging when state changes happen

The key point is that you will have to handle partially degraded service
when things start going wrong anyway. So you need to maintain that in a
robust fashion. Once you understand how to maintain that robustly, it is
often going to guide a natural path for the startup situation. The same
kind of message "tunnel X we depend on just went away" or "tunnel X we
depend on just came back" is the natural state changes in the application
which should be used.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161013/abfbabbf/attachment.htm>