[erlang-questions] Shutting down a gen_server in a supervision tree

Mon Jun 8 13:26:02 CEST 2009

On Fri, Jun 5, 2009 at 9:47 AM, Jeroen Koops <koops.j@REDACTED> wrote:

> Hi Torben,
>
> > 1. Why is it important for your to keep the network connection happy when
> you are shutting down?
>
> For some protocols, it is important that the connection is shut down in an
> orderly manner, instead of just closing the TCP connection. As an example,
> closing an SMPP connection involves first waiting for responses to
> oustanding requests, sending a logout PDU and waiting for a logout response
> PDU.
> Failing to wait for responses to outstanding requests means you don't know
> if certain requests have been accepted by the server. Failing to send the
> logout PDU may lead to the server not realizing that you have logged out and
> rejecting new login attempts for some time.

So these scenarios are for the "dream case", i.e., a controlled shut down.
The protocols should also deal with more malicious loss of connection, but
that is another issue.

In my book one should be more aggressive with regards to logouts: just send
the logout PDU and shut down. Locally you will know which sessions you have
running and which you have already terminated. In most cases you also have
(or should have) a session identifier that can help the server with regards
to new login attempts - you need that for the more abrupt loss of connection
anyway.

>
>
> > 2. When you are being told to die you should only clean up your
> resources. Pending messages should not be a concern. If the requests are so
> important you should consider spawning a separate process for them.
>
> That's a possibility, but not what I want - I want
> application:stop(my_application) to finish only after everything has been
> shut down cleanly. Leaving a new process hanging around handling closing the
> connection is not really an option.

Sorry, I did not explain my idea well enough. In my telecom context you
normally create a separate process for every call/session, but it does not
really address all aspects of your shut down problem anyway.

>
>
> > 3. Your problem sounds more like a gen_fsm problem where you should
> introduce a shutting_down state that collects responeses or times out.
>
> May gen_fsm is more fitting here than a gen_server, but I don't think it
> will make a huge difference. If I create an extra shutting_down state (which
> can be done just as easy with a gen_server by setting some flag in the
> gen_server's state, by the way), someone will still have to trigger this
> state.

Well, this is a matter of style. As soon as I can see the slightest hint of
different states in a component I tend to use the gen_fsm since that is a
neater way of handling the logic. So for me the machine to machine
architecture could very well be a client-server relationship, but in my code
it will more often be gen_fsm's that are implementing the clients (the
servers often require a number of components of different types to provide
the functionality). I tend to use gen_server mainly to manage resources
since the request-reply interaction fits well with that.

>
>
> > 4. My code style is normally to let the supervisors help out with
> starting of processes and try to restart according to the policies. Shutting
> down is normally something for another worker process to instruct us to do.
>
> So you don't use application:stop/1 for this?

No - I probably should, but we have been creating allways on type of
applications (only prototype so far) and there shutting down is not your
first concern. We have mainly focussed on getting the right supervisor
structure in place to provide a robust application.
Furthermore, your scenario with dangling requests is not something we
encounter in our environment.
But I will look into application:prep_stop now that you have brought my
attention to it.

Cheers,
Torben

>
>
> Regards,
>
> Jeroen
>
>
>
>
> On Thu, Jun 4, 2009 at 10:00 PM, Torben Hoffmann <
> torben.lehoff@REDACTED> wrote:
>
>> Some questions/observations to get the right context:
>>
>>    1. Why is it important for your to keep the network connection happy
>>    when you are shutting down?
>>    2. When you are being told to die you should only clean up your
>>    resources. Pending messages should not be a concern. If the requests are so
>>    important you should consider spawning a separate process for them.
>>    3. Your problem sounds more like a gen_fsm problem where you should
>>    introduce a shutting_down state that collects responeses or times out.
>>    4. My code style is normally to let the supervisors help out with
>>    starting of processes and try to restart according to the policies. Shutting
>>    down is normally something for another worker process to instruct us to do.
>>
>> Sorry if I did not get my interpretation of your problem right .
>>
>> Cheers,
>> Torben
>>
>>
>> On Thu, Jun 4, 2009 at 2:11 PM, Jeroen Koops <koops.j@REDACTED> wrote:
>>
>>> Hi all,
>>>
>>> I'm puzzled by the following: I have a gen_server that is part of an
>>> application. The gen_server maintains a network connection using some
>>> request/response protocol. When shutting down, it is important that the
>>> gen_server wait for responses to all outstanding requests (or timeouts)
>>> before terminating. To realize this, I would like the gen_server to keep
>>> operating normally - that is, being able to handle_call/3, handle_cast/2
>>> and
>>> handle_info/2 - between the time the supervisor sends the { 'EXIT',
>>> shutdown
>>> } message, and the moment the gen_server really terminates.
>>>
>>> I thought this was simply a matter of:
>>> - Specifying some integer value > 0 as the Shutdown value in the
>>> supervisor's child-specification
>>> - Setting trap_exit to true in the gen_server's init function
>>> - Handling the { 'EXIT', shutdown } message from the supervisor in the
>>> handle_info/2 function. The only thing this would do is setting some flag
>>> in
>>> the gen_server's internal state, indicating that we are now in the
>>> process
>>> of shutting down.
>>> - When the time has come to really shut down (so after all outstanding
>>> requests have been responded to, or timed out), respond with { stop,
>>> shutdown, SomeReply, SomeState } from a handle_call/3 or with { stop,
>>> shutdown, SomeState } from a handle_cast/2 or handle_info/2, which would
>>> send an exit-message back to the supervisor indicating that we have
>>> terminated.
>>>
>>> Unfortunately, this doesn't seem to work. What seems to happen instead is
>>> for the { 'EXIT', shutdown } message to immediately trigger a call to the
>>> gen_server's terminate function- handle_info/2 is never called.
>>> In theory, I could of course set up a receive-loop in the terminate
>>> function
>>> and deal with incoming messages in that way, but that would mean I would
>>> process incoming messages (data from a TCP connection, for example) in
>>> two
>>> different ways: by implementing handle_info before the shutdown, and by
>>> explicitly receiving message after shutdown.
>>>
>>> How is this normally handled?
>>>
>>> Thanks,
>>>
>>> Jeroen
>>>
>>
>>
>