[PATCH] fixes to release_handler with sync'd relup-ing on inet boot loaded slaves

Steven Gravell steve@REDACTED
Mon Sep 20 03:44:09 CEST 2010


There are a number of places in release_handler.erl and
release_handler_1.erl that assumed it's running on a locally booted (default
-loader efile) system, so I've fixed those bits in the following linked
changeset that you can fetch from the branch listed underneath.

I have outlined the problem/fix in more detail at the bottom of this message
along with some (hopefully) sensible steps to reproduce.

http://github.com/mokele/otp/commit/2aed0fde939c828dd253f1d8582f98a124b38237

git fetch git://github.com/mokele/otp.git diskless_booted_relup_fix


======= FULL COMMIT MESSAGE =========

There were a couple of places in release_handler and release_handler_1 that
assumed we had a disk to read from, which in the case of an erl_prim_loader
Loader other than efile is not necessarily true.

 This change introduces three changes, one being the addition of
check_paths/2 and the other two being the usage of get_vsn/1 and the adding
of get_current_vsn/1 instead of doing beam_lib:version(code:which(Mod)).

 * check_paths/2 was added to do the equivalent of check_path/1 except for
when we have some Masters and need to run it on one of them instead of the
current node.

 * get_vsn is no longer sent a file path but instead is sent the Binary
since beam_lib:version being given a string will check the local file
system, which we can't do.

 * The change to not accessing the local filesystem lead to adding the
loadedvsns field to eval_state that keeps track of the version that is
currently loaded, which is in contrast to vsns and bins which may contain a
different version from that which code:which(Mod) refers to. To check the
equivalent of beam_lib:version(code:which(Mod)) - get_current_vsn(Mod) was
introduced which checks loadedvsns and if it is not found will do the
potentially arduous process of erl_prim_loader:get_file again to read the
version (which load_vsn(Mod) does), I'm not entirely sure that this would
ever happen, but added it for completeness


========= DESCRIPTION TO REPRODUCE =========

** Master System (master@REDACTED) **
 bin/
 clients/slave1@REDACTED/bin/
 clients/slave1@REDACTED/releases/1/  <- current permanent release
 clients/slave1@REDACTED/releases/2/  <- unpacked new release
 erts-5.7.5/
 lib/myapp-1/
 lib/myapp-2/
 log/
 releases/1/  <- current permanent release
 releases/2/  <- unpacked new release

start with -name master@REDACTED -id master@REDACTED
1> erl_boot_server:start([{6,6,6,6}]). % with slave's ip here

** RELUP **
this line should appear in the relup file for release version "2" that is
unpacked in both releases/2/ paths above. With the attached patch this will
cause slave1 to hang, then you can set_unpacked (or just unpack if it isn't
already) on the master which will successfully cause slave1 to finish the
sync once it too reaches this point.
{sync_nodes,boot_server,[master@REDACTED,slave1@REDACTED]}

** Slave System (slave1@REDACTED) **
boot from master by ip address with with flag -host and the following config
[{sasl, [
    {masters, [master@REDACTED]},
    {client_directory, "/path/to/your/target_system/clients/slave1@REDACTED
"},
    {releases_dir, "/path/to/your/target_system/clients/slave1@REDACTED
/releases/"}
]}].
The important thing to realise here is that slave *has* to be on a different
machine *without* the directory structure listed above since these paths
above refer to the boot system and *not* the local file system slave1 is on;
it might not even have one. So the sys.config above is located on master at
/path/to/your/target_system/clients/slave1@REDACTED/releases/1/sys.config

and start the slave... Note the varying directory structure to bin/erl since
we're again making sure we're on a different machine that definitely doesn't
have the code for your new release on it, or else things will just go
smoothly since it'll read the local files, which is not what we want. (where
9.9.9.9 is the ip of the master)
$ /path/to/slave1/target_system/bin/erl \
    -name slave1@REDACTED -id slave1@REDACTED \
    -loader inet \
    -hosts 9.9.9.9
    -boot /path/to/your/target_system/clients/slave1@REDACTED/releases/1/ \
    -config /path/to/your/target_system/clients/slave1@REDACTED/releases/1/

1> RelFile = "/path/to/your/target_system/clients/slave1@REDACTED
/releases/2/myrelease.rel".
2> AppDirs = [{myapp,"2", "/path/to/your/target_system/lib/"}].
3> release_handler:set_unpacked(RelFile, AppDirs).
*{error,{no_such_directory,"/path/to/your/target_system/lib/myapp-2"}}*

this occurs due to check_path/1 not having a version that checks on the
Masters list in release_handler.erl

Secondly after fixing that we reach a different problem:
3> release_handler:set_unpacked(RelFile, AppDirs).
{ok, "2"}
4> release_handler:install_release("2").
*{error,{'EXIT',{{badmatch,{error,beam_lib,{file_error,"/path/to/your/target_system/lib/myapp-2/ebin/myapp.beam",
enoent}}}, ...*

this occurs due to the beam_lib:version(File) call in get_vsn/1 in
release_handler_1.erl due to File being a file path that does not exist
locally on slave1


Well that's it.  After all this writing and debugging I hope I'm not simply
being naive and misunderstood something along the way O_O


/Steven Gravell
http://mokele.co.uk/


More information about the erlang-patches mailing list