[erlang-questions] msg mailbox and gen_svr shutdown

Bob Ippolito bob@REDACTED
Tue May 15 17:20:29 CEST 2012


This isn't acceptable for me, because graceful shutdown can take minutes.
Not listening for connections is ok for me because the clients connect to
multiple servers and choose whichever seems fastest. This method of handoff
reduces the downtime for a given machine to msec. Having a machine down for
minutes would still work, but it's not desirable to have rolling deploys
take hours instead of minutes. Any solution that requires hot code loading
is also not going to work when you upgrade the VM.

The only way to do better is to be able to pass sockets from one OS process
to another (new NIFs), or open a hole to allow the app to modify pf
settings. For me, it's sufficient that if a socket is accepted then it'll
be handed appropriately and in a timely manner, so I prefer the current
(cross-platform and pure erlang) solution.

On Tuesday, May 15, 2012, Loïc Hoguin wrote:

> I personally wouldn't want this to become common, because even if you
> restart immediately, you would lose a few incoming connections (or a lot
> depending on the popularity of your service). If I understand correctly you
> go as far as restarting the whole node?
>
> This isn't like nginx, since nginx doesn't stop listening at any point.
> You could actually get something closer to nginx by doing this:
>
> * set acceptors to 0 (incoming connections get queued)
> * gracefully shutdown the running connections (application dependent)
> * load the new code
> * optionally restart parts of your supervision tree
> * set acceptors back to N
>
> You can add an optional set_protocol_options somewhere in there if you
> need to upgrade the dispatch list (maybe after loading the new code you
> could always run set_protocol_options(my_**project:get_dispatch_stuff) or
> something, which would always get the last options for your project).
>
> The number of incoming connections queued is also application dependent
> (and will be upgradable later on so you can increase temporarily).
>
> This would allow you to do what you want without any extra feature from
> what is currently planned, and without losing any incoming connections.
>
> All this is easily scriptable of course.
>
> If that still doesn't fit your needs I'd suggest opening a ticket to see
> what others think about it.
>
> On 05/15/2012 07:35 AM, Bob Ippolito wrote:
>
>> The idea is to start a new version of the service and do a quick
>> hand-off of listening from old to new (possibly even switching back).
>> This is particularly useful for code upgrades that aren't suitable for
>> hot code loading. We're doing continuous integration this way, as hot
>> code loading requires too much work and/or planning for us at this
>> stage. It's currently an uncommon need but it would be a lot more common
>> if it was well documented and easier to do.
>>
>> As far as load balancers go, we're not using them. For our use case we
>> don't need them, so why add the complexity and/or cost? There is port
>> forwarding done by the OS kernel but I want to avoid requiring some
>> setuid binary to do the privilege escalation that would be necessary to
>> change ports on the fly.
>>
>> Nginx does upgrades in a similar way.
>> http://wiki.nginx.org/**CommandLine#Upgrading_To_a_**
>> New_Binary_On_The_Fly<http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_On_The_Fly>
>>
>> On Monday, May 14, 2012, Loïc Hoguin wrote:
>>
>>    As with all proposed features I first need to understand what it's
>>    used for and why, and whether it'll be useful to other people. What
>>    you are doing just sounds very weird to me and it's the first time I
>>    hear of a need for something like this so at this point I'm neither
>>    trying to avoid anything nor considering it useful functionality,
>>    just wanting to understand why.
>>
>>    On 05/15/2012 06:42 AM, Bob Ippolito wrote:
>>
>>        Yes, I want the other process to immediately take over.
>>        Requiring a load
>>        balancer change isn't an elegant solution, I don't understand
>>        why you're
>>        trying to avoid adding this useful functionality.
>>
>>        On Monday, May 14, 2012, Loïc Hoguin wrote:
>>
>>            You stop listening after you're done gracefully stopping your
>>            currently running processes.
>>
>>            For HTTP that could be something like:
>>
>>            Set acceptors to 0.
>>            Optionally set 'onrequest' hook to reply with a 503 (for
>>        keepalives).
>>            Gracefully stop your processes.
>>            Stop the listener.
>>
>>            Or do you need another process to take over immediately?
>>        Because in
>>            that case you usually don't need to use the same listening
>>        port, you
>>            can just change your firewall/lb rules for the port
>>        redirection from
>>            80->P1 to 80->P2.
>>
>>            On 05/15/2012 03:34 AM, Bob Ippolito wrote:
>>
>>                I don't think that is sufficient, you will need to stop
>>        listening as
>>                well so the next OS process can take over.
>>
>>                On Monday, May 14, 2012, Loïc Hoguin wrote:
>>
>>                    This will be possible later on by reducing the number
>> of
>>                acceptors
>>                    to 0. This should be added sometimes this summer
>>        after the
>>                acceptor
>>                    split happens in Cowboy.
>>
>>                    On 05/14/2012 07:26 PM, Bob Ippolito wrote:
>>
>>                        I agree that graceful shutdown is very
>>        application-specific.
>>                        However,
>>                        cowboy doesn't currently facilitate any sort of
>>        graceful
>>                        shutdown unless
>>                        you read the source code and poke directly at the
>>                appropriate
>>                        supervisors like I did. The application specific
>>        stuff
>>                is easily
>>                        done on
>>                        your own with a process registry or a timeout,
>>        we used a
>>                        combination of
>>                        gproc and a timeout if things didn't shut down in
>> an
>>                acceptable
>>                        time frame.
>>
>>                        I would suggest something like
>>        cowboy:stop_listener/1, maybe
>>                        something
>>                        like cowboy:stop_listening/1 or
>>        cowboy:stop_accepting/1.
>>
>>                        On Sun, May 13, 2012 at 11:02 PM, Loïc Hoguin
>>        <essen@REDACTED
>>        <mailto:essen@REDACTED>> wrote:
>>
>>                            Right, this is pretty much what I said to
>>        Paul in
>>                PM. It was in
>>                            R14B03 that the behavior changed. I
>>        apparently have
>>                a @todo
>>                        wrong
>>                            past that release, and will take a look if I
>>        find other
>>                        things to
>>                  https://gist.github.com/______**__2655724<https://gist.github.com/________2655724>
>>        <https://gist.github.com/_____**_2655724<https://gist.github.com/______2655724>
>> >
>>        <https://gist.github.com/_____**_2655724<https://gist.github.com/______2655724>
>>        <https://gist.github.com/____**2655724<https://gist.github.com/____2655724>
>> >>
>>        <https://gist.github.com/_____**_2655724<https://gist.github.com/______2655724>
>>        <https://gist.github.com/____**2655724<https://gist.github.com/____2655724>
>> >
>>        <https://gist.github.com/____**2655724<https://gist.github.com/____2655724>
>>        <https://gist.github.com/__**2655724<https://gist.github.com/__2655724>
>> >>>
>>        <https://gist.github.com/_____**_2655724<https://gist.github.com/______2655724>
>>        <https://gist.github.com/____**2655724<https://gist.github.com/____2655724>
>> >
>>        <https://gist.github.com/____**2655724<https://gist.github.com/____2655724>
>>        <https://gist.github.com/__**2655724<https://gist.github.com/__2655724>
>> >>
>>        <https://gist.github.com/____**2655724<https://gist.github.com/____2655724>
>>        <https://gist.github.com/__**2655724<https://gist.github.com/__2655724>
>> >
>>        <https://gist.github.com/__**2655724<https://gist.github.com/__2655724>
>>        <https://gist.github.com/**2655724<https://gist.github.com/2655724>
>> >>>>
>>
>>                                        On Thu, May 10, 2012 at 2:53 AM,
>>        Paweł
>>                Peregud
>>        <paulperegud@REDACTED <mailto:paulperegud@REDACTED>
>>        <javascript:_e({}, 'cvml',
>>        'paulperegud@REDACTED
>>        <mailto:paulperegud@REDACTED>');>> wrote:
>>
>>                                        I was having fun with
>>        supervisors yesterday
>>                        (Cowboy seems to
>>                                        fail to fulfill the promise of
>>        not killing
>>                        request processes
>>                                        after listener removal) and I
>>        have an
>>                example.
>>                        I've only
>>                                        investigated the case when
>>        supervisor is
>>                killed,
>>                        so YMMV.
>>                                        Example code is attached. You
>>        may modify
>>                it to
>>                        check the
>>                                        behavior in your case.
>>
>>                                        Start supervisor tree with
>>                        exp_sup_sup:start_link(). Execute
>>                                        test with exp_sup_sup:test() and
>>                        exp_sup_sup:test_simple().
>>
>>                                        In case of dying supervisor the
>>        answer
>>                is "no,
>>                        it does not".
>>
>>                                        When supervisor dies, your
>>        process is
>>                killed as
>>                        via link
>>                                        mechanism, so it may leave some
>>        unprocessed
>>                        messages in
>>                                        inbox. To make sure that every
>>        delivered
>>                message
>>                        is served,
>>                                        you need to add
>>        process_flag(trap_exit,
>>                true).
>>                        Messages that
>>                                        are sent after the moment when
>>                supervisor dies
>>                        are not
>>                                        processed.
>>
>>                                        Best regards,
>>
>>                                        Paul.
>>
>>
>>                                        On May 9, 2012 11:06 AM, "Andy
>>        Richards"
>>        <andy.richards.iit@REDACTED**________com
>>        <mailto:andy.richards.iit@REDACTED>
>>        <javascript:_e({}, 'cvml',
>>        'andy.richards.iit@REDACTED**________com
>>        <mailto:andy.richards.iit@REDACTED>');>> wrote:
>>
>>                                        Hi,
>>
>>                                        I can't seem to see any
>>        confirmation in the
>>                        documentation
>>                                        so was wondering if anyone could
>>        confirm if
>>                        messages are
>>                                        still sent to a supervised gen_svr
>>                following a
>>                        shutdown
>>                                        message?
>>
>>                                        If so how do I cleanly shutdown my
>>                gen_svr without
>>                                        loosing messages? I read in the
>>                supervisor child
>>                        spec
>>                                        that a shutdown can be set to
>>        infinity
>>                which i hoped
>>                                        would allow me to process the
>>        msg's in my
>>                        mailbox but if
>>                                        I do this will my module
>>        continue to receive
>>                        messages
>>                                        from other processes? Is my
>> approach
>>                flawed and
>>                        if so
>>                                        what other ways are there to
>> cleanly
>>                shutting
>>                        down my
>>                                        gen_svr without loosing messages?
>>
>>                                        Many thanks,
>>
>>                                        Andy.
>>
>>                  ______________________________**
>> _________________________
>>                                        erlang-questions mailing list
>>        erlang-questions@REDACTED
>>        <mailto:erlang-questions@REDACTED> <javascript:_e({},
>>        'cvml',
>>        'erlang-questions@REDACTED
>>        <mailto:erlang-questions@REDACTED>')__;>
>>        http://erlang.org/mailman/____**____listinfo/erlang-questions<http://erlang.org/mailman/________listinfo/erlang-questions>
>>        <http://erlang.org/mailman/___**___listinfo/erlang-questions<http://erlang.org/mailman/______listinfo/erlang-questions>
>> >
>>        <http://erlang.org/mailman/___**___listinfo/erlang-questions<http://erlang.org/mailman/______listinfo/erlang-questions>
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>> >>
>>        <http://erlang.org/mailman/___**___listinfo/erlang-questions<http://erlang.org/mailman/______listinfo/erlang-questions>
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>> >
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>>        <http://erlang.org/mailman/__**listinfo/erlang-questions<http://erlang.org/mailman/__listinfo/erlang-questions>
>> >>>
>>        <http://erlang.org/mailman/___**___listinfo/erlang-questions<http://erlang.org/mailman/______listinfo/erlang-questions>
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>> >
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>>        <http://erlang.org/mailman/__**listinfo/erlang-questions<http://erlang.org/mailman/__listinfo/erlang-questions>
>> >>
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>>        <http://erlang.org/mailman/__**listinfo/erlang-questions<http://erlang.org/mailman/__listinfo/erlang-questions>
>> >
>>        <http://erlang.org/mailman/__**listinfo/erlang-questions<http://erlang.org/mailman/__listinfo/erlang-questions>
>>        <http://erlang.org/mailman/**listinfo/erlang-questions<http://erlang.org/mailman/listinfo/erlang-questions>
>> >>>>
>>
>>
>>
>>                  ______________________________**
>> _________________________
>>                                        erlang-questions mailing list
>>        erlang-questions@REDACTED
>>        <mailto:erlang-questions@REDACTED> <javascript:_e({},
>>        'cvml',
>>        'erlang-questions@REDACTED
>>        <mailto:erlang-questions@REDACTED>')__;>
>>        http://erlang.org/mailman/____**____listinfo/erlang-questions<http://erlang.org/mailman/________listinfo/erlang-questions>
>>        <http://erlang.org/mailman/___**___listinfo/erlang-questions<http://erlang.org/mailman/______listinfo/erlang-questions>
>> >
>>        <http://erlang.org/mailman/___**___listinfo/erlang-questions<http://erlang.org/mailman/______listinfo/erlang-questions>
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>> >>
>>        <http://erlang.org/mailman/___**___listinfo/erlang-questions<http://erlang.org/mailman/______listinfo/erlang-questions>
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>> >
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>>        <http://erlang.org/mailman/__**listinfo/erlang-questions<http://erlang.org/mailman/__listinfo/erlang-questions>
>> >>>
>>        <http://erlang.org/mailman/___**___listinfo/erlang-questions<http://erlang.org/mailman/______listinfo/erlang-questions>
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>> >
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>>        <http://erlang.org/mailman/__**listinfo/erlang-questions<http://erlang.org/mailman/__listinfo/erlang-questions>
>> >>
>>        <http://erlang.org/mailman/___**_listinfo/erlang-questions<http://erlang.org/mailman/____listinfo/erlang-questions>
>>        <http://erlang.org/mailman/__**listinfo/erlang-questions<http://erlang.org/mailman/__listinfo/erlang-questions>
>> >
>>        <http://erlang.org/mailman/__**listinfo/erlang-questions<http://erlang.org/mailman/__listinfo/erlang-questions>
>>        <http://erlang.org/mailman/**listinfo/erlang-questions<http://erlang.org/mailman/listinfo/erlang-questions>
>> >>>>
>>
>>
>>
>>                  ______________________________**
>> _________________________
>>                                        erlang-questions mailing list
>>        erlang-questions@REDACTED
>>        <mailto:erlang-questions@REDACTED> <javasc
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120515/aa9e2b92/attachment.htm>


More information about the erlang-questions mailing list