[PATCH] fixes to release_handler with sync'd relup-ing on inet boot loaded slaves

Steven Gravell steve@REDACTED
Mon Sep 20 12:05:22 CEST 2010


On 20 September 2010 02:44, Steven Gravell <steve@REDACTED> wrote:

> There are a number of places in release_handler.erl and
> release_handler_1.erl that assumed it's running on a locally booted (default
> -loader efile) system, so I've fixed those bits in the following linked
> changeset that you can fetch from the branch listed underneath.
>
> I have outlined the problem/fix in more detail at the bottom of this
> message along with some (hopefully) sensible steps to reproduce.
>
>
> http://github.com/mokele/otp/commit/2aed0fde939c828dd253f1d8582f98a124b38237
>
> git fetch git://github.com/mokele/otp.git diskless_booted_relup_fix
>
>
> ======= FULL COMMIT MESSAGE =========
>
> There were a couple of places in release_handler and release_handler_1 that
> assumed we had a disk to read from, which in the case of an erl_prim_loader
> Loader other than efile is not necessarily true.
>
>  This change introduces three changes, one being the addition of
> check_paths/2 and the other two being the usage of get_vsn/1 and the adding
> of get_current_vsn/1 instead of doing beam_lib:version(code:which(Mod)).
>
>  * check_paths/2 was added to do the equivalent of check_path/1 except for
> when we have some Masters and need to run it on one of them instead of the
> current node.
>
>  * get_vsn is no longer sent a file path but instead is sent the Binary
> since beam_lib:version being given a string will check the local file
> system, which we can't do.
>
>  * The change to not accessing the local filesystem lead to adding the
> loadedvsns field to eval_state that keeps track of the version that is
> currently loaded, which is in contrast to vsns and bins which may contain a
> different version from that which code:which(Mod) refers to. To check the
> equivalent of beam_lib:version(code:which(Mod)) - get_current_vsn(Mod) was
> introduced which checks loadedvsns and if it is not found will do the
> potentially arduous process of erl_prim_loader:get_file again to read the
> version (which load_vsn(Mod) does), I'm not entirely sure that this would
> ever happen, but added it for completeness
>
>
> ========= DESCRIPTION TO REPRODUCE =========
>
> ** Master System (master@REDACTED) **
>  bin/
>  clients/slave1@REDACTED/bin/
>  clients/slave1@REDACTED/releases/1/  <- current permanent release
>  clients/slave1@REDACTED/releases/2/  <- unpacked new release
>  erts-5.7.5/
>  lib/myapp-1/
>  lib/myapp-2/
>  log/
>  releases/1/  <- current permanent release
>  releases/2/  <- unpacked new release
>
> start with -name master@REDACTED -id master@REDACTED
> 1> erl_boot_server:start([{6,6,6,6}]). % with slave's ip here
>
> ** RELUP **
> this line should appear in the relup file for release version "2" that is
> unpacked in both releases/2/ paths above. With the attached patch this will
> cause slave1 to hang, then you can set_unpacked (or just unpack if it isn't
> already) on the master which will successfully cause slave1 to finish the
> sync once it too reaches this point.
> {sync_nodes,boot_server,[master@REDACTED,slave1@REDACTED]}
>
> ** Slave System (slave1@REDACTED) **
> boot from master by ip address with with flag -host and the following
> config
> [{sasl, [
>     {masters, [master@REDACTED]},
>     {client_directory,
> "/path/to/your/target_system/clients/slave1@REDACTED"},
>     {releases_dir, "/path/to/your/target_system/clients/slave1@REDACTED
> /releases/"}
> ]}].
> The important thing to realise here is that slave *has* to be on a
> different machine *without* the directory structure listed above since these
> paths above refer to the boot system and *not* the local file system slave1
> is on; it might not even have one. So the sys.config above is located on
> master at /path/to/your/target_system/clients/slave1@REDACTED
> /releases/1/sys.config
>
> and start the slave... Note the varying directory structure to bin/erl
> since we're again making sure we're on a different machine that definitely
> doesn't have the code for your new release on it, or else things will just
> go smoothly since it'll read the local files, which is not what we want.
> (where 9.9.9.9 is the ip of the master)
> $ /path/to/slave1/target_system/bin/erl \
>     -name slave1@REDACTED -id slave1@REDACTED \
>     -loader inet \
>     -hosts 9.9.9.9
>     -boot /path/to/your/target_system/clients/slave1@REDACTED/releases/1/
> \
>     -config /path/to/your/target_system/clients/slave1@REDACTED
> /releases/1/
>
> 1> RelFile = "/path/to/your/target_system/clients/slave1@REDACTED
> /releases/2/myrelease.rel".
> 2> AppDirs = [{myapp,"2", "/path/to/your/target_system/lib/"}].
> 3> release_handler:set_unpacked(RelFile, AppDirs).
> *{error,{no_such_directory,"/path/to/your/target_system/lib/myapp-2"}}*
>
> this occurs due to check_path/1 not having a version that checks on the
> Masters list in release_handler.erl
>
> Secondly after fixing that we reach a different problem:
> 3> release_handler:set_unpacked(RelFile, AppDirs).
> {ok, "2"}
> 4> release_handler:install_release("2").
> *{error,{'EXIT',{{badmatch,{error,beam_lib,{file_error,"/path/to/your/target_system/lib/myapp-2/ebin/myapp.beam",
> enoent}}}, ...*
>
> this occurs due to the beam_lib:version(File) call in get_vsn/1 in
> release_handler_1.erl due to File being a file path that does not exist
> locally on slave1
>
>
> Well that's it.  After all this writing and debugging I hope I'm not simply
> being naive and misunderstood something along the way O_O
>
>
> /Steven Gravell
> http://mokele.co.uk/
>

plus... after sleeping on it I realised I'd missed off the setcookie in the
example above... but I guess that should have gone without saying


/steve
http://mokele.co.uk/


More information about the erlang-patches mailing list