[erlang-questions] Serial device supervision strategy

Tue May 6 21:35:48 CEST 2014

On Tue, May 6, 2014 at 7:27 PM, Camille Troillard <lists@REDACTED> wrote:

> Ideally, I thought the design could be as follows: a gen_server
> (serial_endpoint) whose lifetime would be the same as the serial device’s
> connection is under supervision. If the serial_endpoint fails to start,
> then we try again a second later and so on. Likewise, if the connection is
> lost because the device is unplugged, the serial_endpoint dies, and we try
> again to start it a second later.

Note that supervision and restarts are normally for unforseen errornous
behaviour in applications. In your model, I would assume it is normal to
have devices disconnecting and reconnecting. After all, there is a
connector for the serial device and people might be yanking out the device
now and then. This suggests that the disconnect is not an unforseen error,
but rather a valid situation. This warrants handling it as part of the
normal application flow and not as part of supervision fault handling.

I tend to keep supervision and crashes as devices for handling things that
you don't expect the application to do. Sometimes you *can* model things
with processes marked as 'temporary' in the supervision trees and you don't
care if such a process goes away since it is the normal behaviour of the
application.

First of all, I would consider if device monitoring and device handling has
to live in the same process. They seem to be possible to decouple. So one
process listens on device events and starts a handling process accordingly.
If the device is disconnected, the handling process can be killed by the
monitoring process. It also means that the cleanup in the event of a
disconnected device is left to the monitor and not to the handler. Pursuing
this idea has the advantage that you decouple device presence from the
question of device protocol handling, making it pluggable, flexible and
simpler to implement.

As for a supervision tree:

main_sup - one_for_all (perhaps one_for_rest)
    monitor - gen_server, permanent
    device_handler - gen_server, transient, started dynamically in the tree
when device is present.

The monitor should always run. It periodically checks connectivity on the
serial port. When it detects a device attached, it spawns the
device_handler. The transient rule makes sure that the device handler can
stop in a normal fashion, should the device be gracefully disconnected.

How does this system crash? If the monitor dies *unexpectedly*, we rip the
device handler as well. If the device_handler dies, we restart the monitor
as well. When the system restarts it will attempt to reconstruct a good
state. You can play around with one_for_rest as a strategy if you want the
device handler to be restartable without killing the monitor.

This might not be perfect, but it is a start of a model. Feel free to ask
more questions.

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140506/5fee2ad5/attachment.htm>