[erlang-patches] release_handler_1 improvements

Joe Williams <>
Mon Sep 5 23:08:06 CEST 2011


 Siri, 

Please refetch and let me know if the test works for you now.

Thanks!

-Joe


-- 
Name: Joseph A. Williams
Email: 
Blog: http://www.joeandmotorboat.com/
Twitter: http://twitter.com/williamsjoe


On Monday, September 5, 2011 at 2:32 AM, Siri Hansen wrote:

> Hi Joe - had a look at your changes... and it turned out to be quite interesting...
> 
> First, I saw that you hadn't updated the test suite with the new exit reason - but when I executed the test it did not fail... :o 
> 
> It turned out that you catched the call to ?t:fail/1 - which actually only does an exit... I assume you meant "case catch" and not "catch case"... However, catching this is not at all necessary here, since it will be catched by the rpc:call anyway - it will return {badrpc, {'EXIT', {suspended_supervisor, _}}} . 
> 
> Also, two other comments on the test case: 
> 
> * when the call to get_supervised_procs return something unexpected - please print the return value in the test log (using e.g. ct:log/2) in order to make trouble shooting easier 
> * stopping the node could be moved outside of the test case, in order to avoid hanging nodes if the test fails. This you can do by adding something like 
> 
> supervisor_which_children_timeout(cleanup, Conf) ->
> stop_node(node_name(supervisor_which_children_timeout)).
> 
> 
> This will then be called during end_per_testcase. I see that there are other test cases that don't do this, and I will clean up that before the next release. 
> 
> Regards
> /siri
> 
> 
> 2011/9/1 Joe Williams < (mailto:)>
> >  Seems good to me, I went ahead and made the changes. 
> > 
> > Please refetch and let me know if there is anything else.
> > 
> > -Joe
> > 
> > 
> > -- 
> > Name: Joseph A. Williams
> > Email:  (mailto:)
> >  Blog: http://www.joeandmotorboat.com/
> > Twitter: http://twitter.com/williamsjoe
> > 
> > 
> > On Thursday, September 1, 2011 at 12:59 AM, Siri Hansen wrote:
> > 
> > > Thanks Joe! This looks very good. I have two minor comments only:
> > > 
> > > 1) Maybe you could change the first line of the commit message to say
> > > 
> > > General improvements to release_handler_1:get_supervised_procs 
> > > 
> > > or
> > > 
> > > General improvements to get_supervised_procs in release_handler_1
> > > 
> > > 2) And when I looked at the tests it occured to me that it might be a good idea to change the error reasons to something more descriptive than 'release_handler_error'. What do you think? I know that there will be error messages in the shell, but I find it better if even the returned error reason gives a hint about what failed. My suggestions for the three calls to error/1 respectively (please change if you have better ideas): 
> > > 
> > > suspended_supervisor
> > > {which_children_failed,Other}
> > > {get_modules_failed,Other}
> > > 
> > > Please let me know what you think about this.
> > > 
> > > Regards 
> > > /siri
> > > 
> > > 2011/9/1 Joe Williams < (mailto:)>
> > > >  Siri, 
> > > > 
> > > > I have added tests and squashed my commits, rewriting the commit message while I was at it. Please refetch and let me know if this is up to your standards for graduation.
> > > > 
> > > > -Joe
> > > > 
> > > > 
> > > > -- 
> > > > Name: Joseph A. Williams
> > > > Email:  (mailto:)
> > > >  Blog: http://www.joeandmotorboat.com/
> > > > Twitter: http://twitter.com/williamsjoe
> > > > 
> > > > 
> > > > On Tuesday, August 30, 2011 at 11:41 PM, Siri Hansen wrote:
> > > > 
> > > > > Hi Joe - Yes, you can put your application inside release_handler_SUITE_data/lib. And hopefully you can use the functions create_and_install_fake_first_release and create_fake_upgrade_release - you can see examples of that in release_handler_SUITE.erl. Please let me know if you need any more advice on the testing.
> > > > >  Regards
> > > > > /siri
> > > > > 
> > > > > 
> > > > > 2011/8/30 Joe Williams < (mailto:)>
> > > > > >  Siri, 
> > > > > > 
> > > > > > I could use some advice on how best to test this code. Currently I have a little dummy application with the proper supervision tree to trigger this upgrade case. Here are the steps I use to reproduce: 
> > > > > > 
> > > > > > https://gist.github.com/da109fb6939ef7aac031 
> > > > > > 
> > > > > > Note that this is using my topic branch so you can see the additional logging.
> > > > > > 
> > > > > > I noticed that there is a release_handler_SUITE_data directory in the sasl tests, should I just add another directory specifically for this and store the dummy application there for testing? 
> > > > > > 
> > > > > > Thanks.
> > > > > > 
> > > > > > -Joe
> > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > Name: Joseph A. Williams
> > > > > > Email:  (mailto:)
> > > > > > Blog: http://www.joeandmotorboat.com/
> > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > 
> > > > > > 
> > > > > > On Tuesday, August 30, 2011 at 1:44 AM, Siri Hansen wrote:
> > > > > > 
> > > > > > > Great Joe - this is much better :)
> > > > > > > Could you please add some tests for this also?
> > > > > > > Thanks
> > > > > > > /siri
> > > > > > > 
> > > > > > > 2011/8/29 Joe Williams < (mailto:)>
> > > > > > > > Siri,
> > > > > > > > 
> > > > > > > >  Please fetch this branch again. 
> > > > > > > > 
> > > > > > > > I have added errors where I had my functions returning empty lists. I believe this should bubble up to release_handler causing a restart similar to the timeout behavior we currently have. 
> > > > > > > > 
> > > > > > > > -Joe
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > Name: Joseph A. Williams
> > > > > > > > Email:  (mailto:)
> > > > > > > > Blog: http://www.joeandmotorboat.com/
> > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Monday, August 29, 2011 at 9:35 AM, Joe Williams wrote:
> > > > > > > > 
> > > > > > > > > Siri,
> > > > > > > > > 
> > > > > > > > > In case #2 the node would be in an "unpacked" state but perhaps that isn't possible since the upgrade may be partially installed already. I'll work on implementing #1 and reply back soon. 
> > > > > > > > > 
> > > > > > > > > -Joe
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > -- 
> > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > Email:  (mailto:)
> > > > > > > > >  Blog: http://www.joeandmotorboat.com/
> > > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Monday, August 29, 2011 at 7:31 AM, Siri Hansen wrote:
> > > > > > > > > 
> > > > > > > > > > Hi Joe - 
> > > > > > > > > > I think I would prefer solution 1), although that's probably mostly because I don't really understand solution 2)... What do you mean by "stop the upgrade from completing"? in which state would the node be after this?
> > > > > > > > > > /siri
> > > > > > > > > > 
> > > > > > > > > > 2011/8/26 Joe Williams < (mailto:)>
> > > > > > > > > > >  Siri, 
> > > > > > > > > > > 
> > > > > > > > > > > That sounds correct, with the current patch there is that risk. In my case I would see the error message post-upgrade and restart things as needed but I certainly see your point. The VM restarting is a brutal but idiomatic way to deal with this issue, let it fail :). 
> > > > > > > > > > > 
> > > > > > > > > > > I think there are two possibilities here, 1) continue with the restart behavior but make sure we print error messages before we do or 2) print error messages but stop the upgrade from completing if we catch the bad case. Thoughts? 
> > > > > > > > > > > 
> > > > > > > > > > > -Joe
> > > > > > > > > > > 
> > > > > > > > > > > -- 
> > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > Email:  (mailto:)
> > > > > > > > > > > Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On Friday, August 26, 2011 at 1:08 AM, Siri Hansen wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > Hi again, Joe!
> > > > > > > > > > > > 
> > > > > > > > > > > > Sorry for being so slow - but I still don't really understand :(
> > > > > > > > > > > > My concerns are really about whether or not we should allow the upgrade to be performed in this case. For sure I think we should 
> > > > > > > > > > > > 
> > > > > > > > > > > > 1) avoid the timeout, and
> > > > > > > > > > > > 2) let the user know what the problem is
> > > > > > > > > > > > 
> > > > > > > > > > > > but is it correct to let the upgrade pass after this? Is it not an error situation?
> > > > > > > > > > > > 
> > > > > > > > > > > > It seems to me that we risk getting into a situation where we believe that the system is upgraded, but in fact there could be branches of the supervisor tree where process have not had the chance to run their code_change functions. I mean - even if we print the error report, there is no guarantee that it is really detected unless the operation actually fails. 
> > > > > > > > > > > > 
> > > > > > > > > > > > Please correct me if I completely misunderstood the situation.
> > > > > > > > > > > > 
> > > > > > > > > > > > Regards
> > > > > > > > > > > > /siri
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 2011/8/25 Joe Williams < (mailto:)>
> > > > > > > > > > > > >  Siri, 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I ran into two issues that this patch addresses. Check out the commit message at https://github.com/joewilliams/otp/commit/9c3a53789326cdd929f1c3b4525716b1c0abfe87 for the details. In both cases I found that in production an error in the logs was preferable to the restart of the VM since both are easily fixable with a small application change or in the case of the suspended supervisor using a different app up. Also see this comment in release_handler_1 regarding the supervisor, https://github.com/erlang/otp/blob/dev/lib/sasl/src/release_handler_1.erl#L454 which suggests this corner case is known by at least a few people. Currently there is no way to know *why* your VM just restarted after the upgrade in either case. 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Let me know if you have any other questions.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > -Joe
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > -- 
> > > > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > > >  Email:  (mailto:)
> > > > > > > > > > > > > Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > > >  Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Thursday, August 25, 2011 at 6:35 AM, Siri Hansen wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > Hi again, Joe!
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Could you please explain a bit about the situation where you discovered this problem? I agree that the timeout and VM restart is not very good, and it makes sense to check if the supervisor is suspended. But I'm not really sure if it is correct to allow the upgrade to continue when this error occurs. Even if an error message is printed, I guess it could be quite easy to miss this fact... and the question is if that would be a problem or not? Why is the supervisor suspended in the first place? 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > /siri
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 2011/8/25 Siri Hansen < (mailto:)>
> > > > > > > > > > > > > > >  Hi Joe - I've just started looking at this. Do you think it would be possible to add a test case for it?
> > > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > > /siri
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 2011/8/24 Joe Williams < (mailto:)>
> > > > > > > > > > > > > > > >  Anything I can do regarding this patch? I have happily been running it in production since I submitted it to the list in June. 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > -Joe
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > > > > > > Email:  (mailto:)
> > > > > > > > > > > > > > > >  Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > On Wednesday, July 6, 2011 at 3:43 PM, Joe Williams wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Anything I can do to help this patch graduate?
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > -Joe
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > > > > > > > Email:  (mailto:)
> > > > > > > > > > > > > > > > >  Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > On Tuesday, June 14, 2011 at 12:26 PM, Joe Williams wrote:
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Updated this branch, please refetch.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > git fetch git://github.com/joewilliams/otp.git (http://github.com/joewilliams/otp.git) release_handler_1 
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > > > > > > > > Email:  (mailto:)
> > > > > > > > > > > > > > > > > >  Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > On Friday, June 10, 2011 at 8:52 AM, Joe Williams wrote:
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Great, thanks!
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > > > > > > > > > Email:  (mailto:)
> > > > > > > > > > > > > > > > > > >  Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > > > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > On Friday, June 10, 2011 at 8:51 AM, Raimo Niskanen wrote:
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > On Thu, Jun 09, 2011 at 08:20:51AM -0700, Joe Williams wrote:
> > > > > > > > > > > > > > > > > > > > > Please fetch:
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > git fetch git://github.com/joewilliams/otp.git (http://github.com/joewilliams/otp.git) release_handler_1
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > This is a different branch with a better commit message and no white space changes.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > Excellent. I will include your patch in 'pu' after rewording the
> > > > > > > > > > > > > > > > > > > > summary line to imperative form.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > > > > > > > > > > > Email:  (mailto:)
> > > > > > > > > > > > > > > > > > > > > Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > > > > > > > > > > >  Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > On Thursday, June 9, 2011 at 7:44 AM, Joe Williams wrote:
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Nothing specific, just wondered if anyone had any thoughts on how I dealt with a couple of corner cases in installing releases.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > I'll fix things up and get back shortly.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > > > > > > > > > > > > Email:  (mailto:) (mailto:)
> > > > > > > > > > > > > > > > > > > > > >  Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > > > > > > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > On Thursday, June 9, 2011 at 12:11 AM, Raimo Niskanen wrote:
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Jun 08, 2011 at 03:41:37PM -0700, Joe Williams wrote:
> > > > > > > > > > > > > > > > > > > > > > > > Any thoughts/feedback on this patch? I realize it doesn't follow the guidelines (https://github.com/erlang/otp/wiki/Submitting-patches) exactly and will clean it up soon.
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > Anything in particular? I just got caught up in tideous merge work
> > > > > > > > > > > > > > > > > > > > > > > yesterday and missed to include your patch in 'pu', I was about
> > > > > > > > > > > > > > > > > > > > > > > to take it now.
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > But if you have a cleanup I can wait for it...
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > > > > > > > > > Name: Joseph A. Williams
> > > > > > > > > > > > > > > > > > > > > > > > Email:  (mailto:) (mailto:)
> > > > > > > > > > > > > > > > > > > > > > > >  Blog: http://www.joeandmotorboat.com/
> > > > > > > > > > > > > > > > > > > > > > > > Twitter: http://twitter.com/williamsjoe
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > On Tuesday, June 7, 2011 at 2:33 PM, Joe Williams wrote:
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > >  git fetch git://github.com/joewilliams/otp.git (http://github.com/joewilliams/otp.git) (http://github.com/joewilliams/otp.git) (http://github.com/joewilliams/otp.git) release_handler
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > > > > > > > > > > > > erlang-patches mailing list
> > > > > > > > > > > > > > > > > > > > > > > >  (mailto:) (mailto:)
> > > > > > > > > > > > > > > > > > > > > > > > http://erlang.org/mailman/listinfo/erlang-patches
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > > > > > > > > > > erlang-patches mailing list
> > > > > > > > > > > > > > > > > > > > > >  (mailto:) (mailto:)
> > > > > > > > > > > > > > > > > > > > > > http://erlang.org/mailman/listinfo/erlang-patches
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > > > >  erlang-patches mailing list
> > > > > > > > > > > > > > > >  (mailto:)
> > > > > > > > > > > > > > > > http://erlang.org/mailman/listinfo/erlang-patches
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > erlang-patches mailing list
> > > > > > > > >  (mailto:)
> > > > > > > > > http://erlang.org/mailman/listinfo/erlang-patches
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20110905/a5a45a26/attachment-0001.html>


More information about the erlang-patches mailing list