[erlang-questions] Serial device supervision strategy

Camille Troillard lists@REDACTED
Tue May 6 21:54:12 CEST 2014


Fred, Jesper,

Thank you so much for your detailed answers.
It makes a lot of sense now.


Cam


On 06 May 2014, at 21:45, Fred Hebert <mononcqc@REDACTED> wrote:

> Hi Camille,
> 
> In general, my view on these things is that supervision isn't about
> restarting, it's about rolling back to a known and stable state.
> 
> What you restart to should be *guarantees* you are giving to any process
> that will start after yours. For example, if you start the client to the
> serial device, any piece of code you boot after that that depends on you
> has two possible scenarios:
> 
> 1. The device is connected
> 2. The device isn't connected
> 
> Now, from a user of your library, if you go the route of 'crash and
> restart every time', there is going to be what could be a 50% chance
> that when you try to ask the client to talk to the device, you get a
> crash for 'noproc' and everything blows up, or you actually get your
> connection.
> 
> The approach I recommend in these cases is to think about what you want
> to guarantee to your users (I wrote in more detail on this at
> http://ferd.ca/it-s-about-the-guarantees.html)
> 
> The gist of it is this: if you expect the serial device not to be
> connected a lot of times, it's likely nicer for your users if you bake
> that into your design. The guarantee then becomes 'I boot a client I can
> talk to'. That guarantee means you'll have to possible plan an interface
> where the client is free to return something like '{error,
> not_connected}' or something similar. You are then free to start a timer
> or leave it to the user to send a 'reconnect' message before proceeding
> with more requests.
> 
> Otherwise, putting the connecting phase in the supervisor means that the
> connection being established is to be regarded as an invariant of your
> application: this application should not and cannot be booted if you
> haven't plugged the device into the serial port. This is desirable, for
> example, in cases such as local databases you know you want to be there
> on the local host, sending stuff to UDP ports, and so on.
> 
> It is, essentially, what 'restarting to a known stable state' means. And
> if you can't do it, you can't run the app: the state is too unstable.
> 
> What I'm getting at here is that what you're trying to do doesn't have
> an existing supervision strategy. You should rather decide whether the
> app should crash when the device isn't connected, or move the
> reconnection strategy to some specific callback of an OTP behavior,
> because it is to be expected to happen and should be handled with custom
> code (exponential backoff? on-demand retry?, etc.).
> 
> This is a really important distinction and decision to make, and there
> have been countless high-use production services that crashed because
> developers who didn't (or couldn't) know better ended up declaring an
> unreliable connection as a guarantee they offer, when they can't
> actually do that.
> 
> Regards,
> Fred.
> 
> On 05/06, Camille Troillard wrote:
>> Hello,
>> 
>> I am going back to Erlang after a bit of distraction, please excuse this rusty question.
>> 
>> My application has to connect to a serial device which may or may not be plugged to the computer.
>> 
>> It seemed a simple approach would be to try to connect to the device every seconds, and retry again until the connection is done. Ideally, I thought the design could be as follows: a gen_server (serial_endpoint) whose lifetime would be the same as the serial device’s connection is under supervision. If the serial_endpoint fails to start, then we try again a second later and so on. Likewise, if the connection is lost because the device is unplugged, the serial_endpoint dies, and we try again to start it a second later.
>> 
>> Unfortunately, I don’t see how to achieve this restart strategy.
>> Can someone help me find the right solution?
>> 
>> 
>> Thank you in advance.
>> 
>> Best Regards,
>> Camille
>> 
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list