[erlang-questions] Serial device supervision strategy

Tue May 6 21:45:27 CEST 2014

Hi Camille,

In general, my view on these things is that supervision isn't about
restarting, it's about rolling back to a known and stable state.

What you restart to should be *guarantees* you are giving to any process
that will start after yours. For example, if you start the client to the
serial device, any piece of code you boot after that that depends on you
has two possible scenarios:

1. The device is connected
2. The device isn't connected

Now, from a user of your library, if you go the route of 'crash and
restart every time', there is going to be what could be a 50% chance
that when you try to ask the client to talk to the device, you get a
crash for 'noproc' and everything blows up, or you actually get your
connection.

The approach I recommend in these cases is to think about what you want
to guarantee to your users (I wrote in more detail on this at
http://ferd.ca/it-s-about-the-guarantees.html)

The gist of it is this: if you expect the serial device not to be
connected a lot of times, it's likely nicer for your users if you bake
that into your design. The guarantee then becomes 'I boot a client I can
talk to'. That guarantee means you'll have to possible plan an interface
where the client is free to return something like '{error,
not_connected}' or something similar. You are then free to start a timer
or leave it to the user to send a 'reconnect' message before proceeding
with more requests.

Otherwise, putting the connecting phase in the supervisor means that the
connection being established is to be regarded as an invariant of your
application: this application should not and cannot be booted if you
haven't plugged the device into the serial port. This is desirable, for
example, in cases such as local databases you know you want to be there
on the local host, sending stuff to UDP ports, and so on.

It is, essentially, what 'restarting to a known stable state' means. And
if you can't do it, you can't run the app: the state is too unstable.

What I'm getting at here is that what you're trying to do doesn't have
an existing supervision strategy. You should rather decide whether the
app should crash when the device isn't connected, or move the
reconnection strategy to some specific callback of an OTP behavior,
because it is to be expected to happen and should be handled with custom
code (exponential backoff? on-demand retry?, etc.).

This is a really important distinction and decision to make, and there
have been countless high-use production services that crashed because
developers who didn't (or couldn't) know better ended up declaring an
unreliable connection as a guarantee they offer, when they can't
actually do that.

Regards,
Fred.

On 05/06, Camille Troillard wrote:
> Hello,
> 
> I am going back to Erlang after a bit of distraction, please excuse this rusty question.
> 
> My application has to connect to a serial device which may or may not be plugged to the computer.
> 
> It seemed a simple approach would be to try to connect to the device every seconds, and retry again until the connection is done. Ideally, I thought the design could be as follows: a gen_server (serial_endpoint) whose lifetime would be the same as the serial device’s connection is under supervision. If the serial_endpoint fails to start, then we try again a second later and so on. Likewise, if the connection is lost because the device is unplugged, the serial_endpoint dies, and we try again to start it a second later.
> 
> Unfortunately, I don’t see how to achieve this restart strategy.
> Can someone help me find the right solution?
> 
> 
> Thank you in advance.
> 
> Best Regards,
> Camille
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions