<div dir="ltr">To me it seems like what you want is a circuit breaker (like Jesper's fuse [0], or breaky [1]) in front of the hardware module. Construct your Erlang code so that it does not crash on hardware errors, but instead melts the fuse. The supervisor is thus only involved when something strange happens in your code and where a restart will bring you back to known state. We use something similar (managed by [2], the name collision with [0] is purely coincidental) for graceful degradation if our HSMs go down. We also use [2] for managing database connections. In fact, you can use this strategy for dealing with any kind of external service without risking taking down your supervision tree.<div><br></div><div>//Daniel<br><div><br></div>[0] <a href="https://github.com/jlouis/fuse">https://github.com/jlouis/fuse</a><div>[1] <a href="https://github.com/mmzeeman/breaky">https://github.com/mmzeeman/breaky</a></div><div>[2] <a href="https://github.com/ulfl/fuse-lb">https://github.com/ulfl/fuse-lb</a></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 16, 2014 at 12:14 PM, David Welton <span dir="ltr"><<a href="mailto:davidnwelton@gmail.com" target="_blank">davidnwelton@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
On Mon, Sep 15, 2014 at 10:00 PM, Jay Nelson <<a href="mailto:jay@duomark.com">jay@duomark.com</a>> wrote:<br>
> David N. Welton wrote:<br>
><br>
>> So you would advocate putting everything in one Erlang ‘application'<br>
>> in order to take advantage of the restart capabilities such as<br>
>> rest_for_one? I was actually moving to break things up into separate<br>
>> applications with different git trees and everything, so that things<br>
>> could be developed in a more independent way: for instance, the report<br>
>> generation software gets its own application, separate from the<br>
>> hardware control software.<br>
<br>
> Don’t confuse development with operations. If you like, you can make<br>
> separate applications in separate github repos. That may make it easier<br>
> to test each component. I would do that and have separate PropEr<br>
> test suites run by common_test for each one (that’s my current style).<br>
><br>
> It means managing separate repos, but if the components are generally<br>
> useful, it makes it convenient for others.<br>
><br>
> In operations I would have one application that uses included_applications and<br>
> starts the root supervisor of each of your other components in the correct<br>
> dependency and startup sequence. Especially if you are writing all the<br>
> components, you will be intimately familiar with the dependencies and<br>
> start up behaviour of each one. Of course, this overlord application is the<br>
> real application you started talking about and would be a separate repo<br>
> of its own.<br>
<br>
Aha! I had missed included_applications, and indeed, that looks like<br>
a potentially good way of having both the separate applications as<br>
well as the supervision tree.<br>
<br>
It seems that not everyone is in favor of these:<br>
<a href="http://learnyousomeerlang.com/the-count-of-applications#included-applications" target="_blank">http://learnyousomeerlang.com/the-count-of-applications#included-applications</a><br>
- and I can see that more tightly coupling things is potentially<br>
problematic. Realistically though, a lot of our code won't be used<br>
without all the other things present either.<br>
<br>
> If you have several applications, rather than using included_applications,<br>
> you will have the possibility of a component failure which is undetected<br>
> and will not restart without manually restarting or writing your own code<br>
> to monitor and manage them.<br>
<br>
>> I was starting to think along the lines of a centralized system for<br>
>> monitoring some of these applications...<br>
><br>
> Hmm. I prefer to use the OTP tools that are present, and use them to my<br>
> benefit to avoid such circumstances. Splitting into independent applications<br>
> defeats all the restart facilities of OTP, unless you use heart and make<br>
> them all permanent applications and are willing to wait for VM restarts<br>
> when things start to go sideways…<br>
<br>
Yes, that's part of what I'm after: how to keep things within OTP as<br>
much as possible.<br>
<br>
After thinking things through some, though, and after Fred Hébert<br>
kindly took the time to discuss some of this with me on #erlang, I<br>
have come to the conclusion that:<br>
<br>
OTP alone is not up to the task - there has to be some kind of extra<br>
layer or extra logic in there to deal with systems that might not be<br>
functioning.<br>
<br>
Perhaps this provokes a reaction in the reader along the lines of "he<br>
has a firm grasp of the obvious", but after drinking the OTP cool-aid,<br>
going outside it feels like "I wonder what I'm doing wrong or what I'm<br>
missing - they must have something for this, right?".<br>
<br>
Take, for instance, the hardware in our system - it shouldn't fail,<br>
and the system will not work as advertised if it does. *However*,<br>
sooner or later, it probably will fail somehow, and the system needs<br>
to stay up to aid the user in running diagnostics. Simply including<br>
the hardware in the supervision tree leads to things gradually falling<br>
over in an unacceptable way.<br>
<br>
Fred talks about these concepts some here:<br>
<a href="http://ferd.ca/it-s-about-the-guarantees.html" target="_blank">http://ferd.ca/it-s-about-the-guarantees.html</a><br>
<br>
To my way of thinking, it really seems like there should be something<br>
more out there in Erlang land for these situations; something that<br>
intermediate people like myself can easily find and make use of and<br>
feel confident we're doing the right thing.<br>
<br>
* Better documentation, at least. I think "the database for a web<br>
site" provides a great example. The web site should not fall over<br>
when the DB becomes unavailable. Code should be included. We hear<br>
plenty about letting it crash, but there's a significant number of use<br>
cases where no, it's actually more complex than that.<br>
<br>
* Some kind of gen_transient_service that gathers up the best<br>
practices and is a "good enough" solution in many cases. This would<br>
help for the "low level" case of a specific resource. It could come<br>
with a couple of strategies, and perhaps be pluggable in order to<br>
include more... things like exponential backoff. A lot of this code<br>
has to look pretty similar: have the connection status in the state,<br>
return errors if it's not connected, have a fast init as well as a<br>
callback that attempts the connection, and then whatever strategy to<br>
handle errors with the connection. Wrapping it up in a library seems<br>
possible even if it doesn't cover every corner case out there.<br>
<br>
* Perhaps some kind of application manager. I'm actually thinking of<br>
writing code along these lines, as the above is too specific in our<br>
case (I think, at least). Our hardware management stuff has a variety<br>
of programs that it takes care of, and having client portions of our<br>
code know about all of them is probably not a good idea. I'd rather<br>
just have the hardware application go down and have our application<br>
manager alert the user, and keep track of what's running: "the<br>
hardware system is up, but the report generation system is down". I'm<br>
still trying to work out in my head if this is a good idea or not<br>
though.... perhaps the gen_transient_service thing is better.<br>
<br>
Thoughts?<br>
<br>
Thanks again for reading, and apologies if my normally muddied<br>
thoughts are more silted up than usual; I'm a bit short on sleep.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
David N. Welton<br>
<br>
<a href="http://www.welton.it/davidw/" target="_blank">http://www.welton.it/davidw/</a><br>
<br>
<a href="http://www.dedasys.com/" target="_blank">http://www.dedasys.com/</a><br>
_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
</font></span></blockquote></div><br></div>