[erlang-questions] simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Sat May 14 18:32:39 CEST 2016

I trust your judgement on the need for separate processes :)

But complexity doesn't justify a _process_ - just a relatively more
complex function. If the real world entities being served map to
independent _threads of execution_ in time and space, process. If not,
consider not a process. I only underscore this because it's not
uncommon to see folks use processes (a runtime construct) used to
model abstract programming logic (a design time construct at best,
often simply an emotional/mental fancy). Not suggesting you fall into
that category, just a highlight ;)

On Sat, May 14, 2016 at 9:15 AM, Oliver Korpilla <Oliver.Korpilla@REDACTED> wrote:
> Hello, Garrett.
>
> The TCP layer is a stand-in for a real, more complex protocol stack with
> very different characteristics. I assure you the abstractions are absolutely
> necessary and map real-world entities being served in a system quite more
> complex than a TCP server. While the project is currently at the scale of a
> technology demonstration it is supposed to grow into a full-fledged
> application. Sorry I have to be so vague.
>
> Relying on the client for retries might work. My client handlers are
> "stateless" in that they can come back at any time from the DB and serve a
> request, even if they have to tell the client to abort its current
> operation. I have to investigate the behavior of the given clients more.
> That's likely the real solution.
>
> Thanks!
> Oliver
>
> Gesendet: Samstag, 14. Mai 2016 um 17:11 Uhr
> Von: "Garrett Smith" <g@REDACTED>
> An: "Oliver Korpilla" <Oliver.Korpilla@REDACTED>
> Cc: "Jesper Louis Andersen" <jesper.louis.andersen@REDACTED>,
> "Erlang-Questions Questions" <erlang-questions@REDACTED>
> Betreff: Re: [erlang-questions] simple_one_for_one supervisor - what happens
> at restart? (also: gen_tcp)
> On Sat, May 14, 2016 at 5:21 AM, Oliver Korpilla <Oliver.Korpilla@REDACTED>
> wrote:
>> Hello,
>>
>> and thank you all for your responses.
>>
>> I originally adopted simple_one_for_one supervisor because I had a problem
>> with how other supervisors clean up processes.
>>
>> For the TCP connectors simple_one_for_one will be fine. As noted by
>> others, they cannot really come back unless they reconnect, so that is fine.
>> So, a simple_one_for_one supervisor acts like every child, regardless of
>> child spec, as if it was temporary?
>>
>> I have another big batch of processes independent of the connectors. These
>> serve individual requests emanating from the TCP layer, where an ID
>> establishes which handler belongs to which batch of messages (i.e. each TCP
>> payload contains an ID in its own proprietary header). Now, I originally saw
>> these as transient workers I would like to have restarted, but since they
>> are stateless and can be created on demand, I either can supervise them
>> simple_one_for_one (and create them on demand when the one for a given ID is
>> missing) or I can create them as transient children under a one_for_one and
>> let that restart it on a crash.
>
> If these processes only ever act on behalf of the TCP connection,
> consider not using them at all. Just let the TCP connections do the
> work.
>
> Processes should correspond to _real world_ independent threads of
> execution, not mental abstractions.
>
> If you do have separate threads of execution (e.g. TCP connection is
> providing updates to the client while it waits on these spawned
> workers) use a separate simple_one_for_one (sofo) supervisor for the
> workers and link your connection/worker processes.
>
>> I originally went for simple_one_for_one because of the better performance
>> and because it cleans up children after they terminate. I guess in case of
>> one_for_one I have to clean up all children which shut down normally by
>> calling terminate_child and delete_child on them. (I originally hoped
>> one_for_one would do this if a child exited normally, but either I bungled
>> my tests or it simply doesn't, even for transient children).
>
> If you're ever routinely "cleaning up" after a supervisor, it's a bad
> sign. Configure (one-time init payload) your supervisors and let them
> do their thing. If you're accumulating a lot of terminated child
> processes, you want a sofo supervisor.
>
>> Any recommendations?
>
> It sounds like you're motivated to get a "restart" scenario here. What
> is your goal from the end-user (client of your app) point of view
> here? Without a specific goal that you understand and can defend, your
> default approach I think is always crash - and let the client
> reestablish a connection.
>
> Some worthy goals:
>
> - Don't abruptly close the connection but return a well formed error
> (e.g. HTTP 500, etc.)
> - Handle specific well understood error conditions with limited
> retries (e.g. reconnect to a database with the hope the outage is
> short term)
> - Tell the client to retry a different end-point (e.g. HTTP 302)
>
> Each of these needs goals needs to be implemented - you're not going
> to get any of them with a supervisor process restart. Short of a
> worthy goal, just crash, maintaining your system integrity for
> processing new connections, and rely on the client (outside your
> system) to perform the "restart".