[erlang-patches] release_handler_1 improvements

Siri Hansen erlangsiri@REDACTED
Wed Aug 31 08:41:06 CEST 2011


Hi Joe - Yes, you can put your application inside
release_handler_SUITE_data/lib. And hopefully you can use the
functions create_and_install_fake_first_release
and create_fake_upgrade_release - you can see examples of that in
release_handler_SUITE.erl. Please let me know if you need any more advice on
the testing.
Regards
/siri


2011/8/30 Joe Williams <joe@REDACTED>

>  Siri,
>
> I could use some advice on how best to test this code. Currently I have a
> little dummy application with the proper supervision tree to trigger this
> upgrade case. Here are the steps I use to reproduce:
>
> https://gist.github.com/da109fb6939ef7aac031
>
> Note that this is using my topic branch so you can see the additional
> logging.
>
> I noticed that there is a release_handler_SUITE_data directory in the sasl
> tests, should I just add another directory specifically for this and store
> the dummy application there for testing?
>
> Thanks.
>
> -Joe
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Tuesday, August 30, 2011 at 1:44 AM, Siri Hansen wrote:
>
> Great Joe - this is much better :)
> Could you please add some tests for this also?
> Thanks
> /siri
>
> 2011/8/29 Joe Williams <joe@REDACTED>
>
> Siri,
>
> Please fetch this branch again.
>
> I have added errors where I had my functions returning empty lists. I
> believe this should bubble up to release_handler causing a restart similar
> to the timeout behavior we currently have.
>
> -Joe
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Monday, August 29, 2011 at 9:35 AM, Joe Williams wrote:
>
>  Siri,
>
> In case #2 the node would be in an "unpacked" state but perhaps that isn't
> possible since the upgrade may be partially installed already. I'll work on
> implementing #1 and reply back soon.
>
> -Joe
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Monday, August 29, 2011 at 7:31 AM, Siri Hansen wrote:
>
> Hi Joe -
> I think I would prefer solution 1), although that's probably mostly because
> I don't really understand solution 2)... What do you mean by "stop the
> upgrade from completing"? in which state would the node be after this?
> /siri
>
> 2011/8/26 Joe Williams <joe@REDACTED>
>
>  Siri,
>
> That sounds correct, with the current patch there is that risk. In my case
> I would see the error message post-upgrade and restart things as needed but
> I certainly see your point. The VM restarting is a brutal but idiomatic way
> to deal with this issue, let it fail :).
>
> I think there are two possibilities here, 1) continue with the restart
> behavior but make sure we print error messages before we do or 2) print
> error messages but stop the upgrade from completing if we catch the bad
> case. Thoughts?
>
> -Joe
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Friday, August 26, 2011 at 1:08 AM, Siri Hansen wrote:
>
> Hi again, Joe!
>
> Sorry for being so slow - but I still don't really understand :(
> My concerns are really about whether or not we should allow the upgrade to
> be performed in this case. For sure I think we should
>
> 1) avoid the timeout, and
> 2) let the user know what the problem is
>
> but is it correct to let the upgrade pass after this? Is it not an error
> situation?
>
>  It seems to me that we risk getting into a situation where we believe
> that the system is upgraded, but in fact there could be branches of the
> supervisor tree where process have not had the chance to run their
> code_change functions. I mean - even if we print the error report, there is
> no guarantee that it is really detected unless the operation actually fails.
>
> Please correct me if I completely misunderstood the situation.
>
> Regards
> /siri
>
>
> 2011/8/25 Joe Williams <joe@REDACTED>
>
>  Siri,
>
> I ran into two issues that this patch addresses. Check out the commit
> message at
> https://github.com/joewilliams/otp/commit/9c3a53789326cdd929f1c3b4525716b1c0abfe87 for
> the details. In both cases I found that in production an error in the logs
> was preferable to the restart of the VM since both are easily fixable with a
> small application change or in the case of the suspended supervisor using a
> different app up. Also see this comment in release_handler_1 regarding the
> supervisor,
> https://github.com/erlang/otp/blob/dev/lib/sasl/src/release_handler_1.erl#L454 which
> suggests this corner case is known by at least a few people. Currently there
> is no way to know *why* your VM just restarted after the upgrade in either
> case.
>
> Let me know if you have any other questions.
>
> -Joe
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Thursday, August 25, 2011 at 6:35 AM, Siri Hansen wrote:
>
> Hi again, Joe!
>
> Could you please explain a bit about the situation where you discovered
> this problem? I agree that the timeout and VM restart is not very good, and
> it makes sense to check if the supervisor is suspended. But I'm not really
> sure if it is correct to allow the upgrade to continue when this error
> occurs. Even if an error message is printed, I guess it could be quite easy
> to miss this fact... and the question is if that would be a problem or not?
> Why is the supervisor suspended in the first place?
>
> Regards
> /siri
>
>
> 2011/8/25 Siri Hansen <erlangsiri@REDACTED>
>
>   Hi Joe - I've just started looking at this. Do you think it would be
> possible to add a test case for it?
> Regards
> /siri
>
>
> 2011/8/24 Joe Williams <joe@REDACTED>
>
>  Anything I can do regarding this patch? I have happily been running it in
> production since I submitted it to the list in June.
>
> -Joe
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Wednesday, July 6, 2011 at 3:43 PM, Joe Williams wrote:
>
>  Anything I can do to help this patch graduate?
>
> Thanks!
>
> -Joe
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Tuesday, June 14, 2011 at 12:26 PM, Joe Williams wrote:
>
>  Updated this branch, please refetch.
>
> git fetch git://github.com/joewilliams/otp.git release_handler_1
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Friday, June 10, 2011 at 8:52 AM, Joe Williams wrote:
>
>  Great, thanks!
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
> On Friday, June 10, 2011 at 8:51 AM, Raimo Niskanen wrote:
>
> On Thu, Jun 09, 2011 at 08:20:51AM -0700, Joe Williams wrote:
>
> Please fetch:
>
> git fetch git://github.com/joewilliams/otp.git release_handler_1
>
> This is a different branch with a better commit message and no white space
> changes.
>
>
> Excellent. I will include your patch in 'pu' after rewording the
> summary line to imperative form.
>
>
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
>
> On Thursday, June 9, 2011 at 7:44 AM, Joe Williams wrote:
>
> Nothing specific, just wondered if anyone had any thoughts on how I dealt
> with a couple of corner cases in installing releases.
>
> I'll fix things up and get back shortly.
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED (mailto:joe@REDACTED <joe@REDACTED>)
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
>
> On Thursday, June 9, 2011 at 12:11 AM, Raimo Niskanen wrote:
>
> On Wed, Jun 08, 2011 at 03:41:37PM -0700, Joe Williams wrote:
>
> Any thoughts/feedback on this patch? I realize it doesn't follow the
> guidelines (https://github.com/erlang/otp/wiki/Submitting-patches) exactly
> and will clean it up soon.
>
>
> Anything in particular? I just got caught up in tideous merge work
> yesterday and missed to include your patch in 'pu', I was about
> to take it now.
>
> But if you have a cleanup I can wait for it...
>
>
>
> --
> Name: Joseph A. Williams
> Email: joe@REDACTED (mailto:joe@REDACTED <joe@REDACTED>)
> Blog: http://www.joeandmotorboat.com/
> Twitter: http://twitter.com/williamsjoe
>
>
> On Tuesday, June 7, 2011 at 2:33 PM, Joe Williams wrote:
>
> git fetch git://github.com/joewilliams/otp.git (
> http://github.com/joewilliams/otp.git) (
> http://github.com/joewilliams/otp.git) release_handler
>
>
> _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED (mailto:erlang-patches@REDACTED<erlang-patches@REDACTED>
> )
> http://erlang.org/mailman/listinfo/erlang-patches
>
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>
>
> _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED (mailto:erlang-patches@REDACTED<erlang-patches@REDACTED>
> )
> http://erlang.org/mailman/listinfo/erlang-patches
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>
>
>
>
>
>
> _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED
> http://erlang.org/mailman/listinfo/erlang-patches
>
>
>
>
>
>
>
>
>  _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED
> http://erlang.org/mailman/listinfo/erlang-patches
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20110831/f9f9fa4c/attachment.htm>


More information about the erlang-patches mailing list