OTP Release Handling Tutorial

This tutorial attempts to show by example how to build a proper OTP-based system.

A quick way to get started might be to copy this example system, and modify the configuration files. I will try to explain the purpose of each file and suggest how they may be modified.

The Example System

The example system (called "example") runs on two processors (note that the two processors can easily be two Erlang nodes on the same physical machine), and contains two applications:

Basic File Structure

Note: The code is riddled with comments. If you view it with e.g. Emacs using fontification, it may be easier to read.

With this file structure in place, you are ready to support in-service upgrade, but that will be described in a future tutorial.

Compiling the code

...Nothing to it, really. I didn't bother with make scripts and stuff like that. Go into each src/ directory and type:

erlc -W -o ../ebin *.erl

Building a boot script

The easiest way to build the boot script is to place yourself in the $DIR/releases/1.0 directory, start an erlang shell, and type the following:

Eshell V5.2  (abort with ^G)
1>
<... output snipped>
=PROGRESS REPORT==== 4-Dec-2002::16:55:34 ===
         application: sasl
          started_at: nonode@nohost

1> Dir = "/home/etxuwig/work/erlang/release_tutorial".
"/home/etxuwig/work/erlang/release_tutorial"
2> Path = [Dir ++ "/lib/*/ebin"].
["/home/etxuwig/work/erlang/release_tutorial/lib/*/ebin"]
3> Var = {"MYAPPS", Dir}.
{"MYAPPS","/home/etxuwig/work/erlang/release_tutorial"}
4> systools:make_script("example",[{path,Path},{variables,[Var]}]).
ok
    

Now, you should be able to see an example.script file in releases/1.0/. It contains instructions for the Erlang/OTP boot loader. The .script file is converted into an Erlang binary which is stored in example.boot in the same directory.

Making a tar file

Using systools:make_tar("example", Options) (where Options is the same list of options as for make_script/2, you can pack your release into a tar file, and unpack it on a target system. The -boot_var option makes the code re-locatable. See erl -man systools for more detailed instructions.

Running the example

There are tricks for starting an embedded system and being able to attach a shell to a node, but that's another tutorial.

I will show how one could easily get something up and running on a Unix workstation. Windows users will have to translate.

It doesn't really matter if you start both nodes at once, or one at a time. In the sys.config file, a node synchronization timeout of 10 seconds was specified. After that, the first node will continue alone if the other node has not yet appeared.

This is of course an interesting thing to try. If you start n1 first, you may see the following output:

[etxuwig@cbe1066]: erl -boot ./example -config ./sys -boot_var MYAPPS $DIR -sname n1
Erlang (BEAM) emulator version 5.2 [hipe] [threads:0]

Eshell V5.2  (abort with ^G)
(n1@cbe1066)1> 
=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.45.0>},
                       {name,alarm_handler},
                       {mfa,{alarm_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.46.0>},
                       {name,overload},
                       {mfa,{overload,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.44.0>},
                       {name,sasl_safe_sup},
                       {mfa,{supervisor,
                                start_link,
                                [{local,sasl_safe_sup},sasl,safe]}},
                       {restart_type,permanent},
                       {shutdown,infinity},
                       {child_type,supervisor}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.47.0>},
                       {name,release_handler},
                       {mfa,{release_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
         application: sasl
          started_at: n1@cbe1066
base_server starting.

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,base_super}
             started: [{pid,<0.53.0>},
                       {name,server},
                       {mfa,{base_server,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
         application: base
          started_at: n1@cbe1066
dist_app:start(normal, _)
dist_server starting.

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,dist_super}
             started: [{pid,<0.58.0>},
                       {name,server},
                       {mfa,{dist_server,
                                start_link,
                                [#Fun,
                                 #Fun]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]
dist_app:start_phase(takeover, _)
dist_app:start_phase(go, _)
handle_call({go, normal},...)

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
         application: dist
          started_at: n1@cbe1066

(n1@cbe1066)1> 
(n1@cbe1066)1> global:whereis_name(dist_server).
<0.58.0>
(n1@cbe1066)2> dist_server:get_value().
undefined
(n1@cbe1066)3> dist_server:set_value(17).
{ok,undefined}

    

We can see that the globally registered dist_server is running locally, and we can call the API functions get_value/0 and set_value/1.

If we now start n2, dist_server should migrate over to that node (since it is so specified in the sys.config file.)

[etxuwig@cbe1066]: erl -boot ./example -config ./sys -boot_var MYAPPS $DIR -sname n2
Erlang (BEAM) emulator version 5.2 [hipe] [threads:0]

Eshell V5.2  (abort with ^G)
(n2@cbe1066)1> 
=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.46.0>},
                       {name,alarm_handler},
                       {mfa,{alarm_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.47.0>},
                       {name,overload},
                       {mfa,{overload,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.45.0>},
                       {name,sasl_safe_sup},
                       {mfa,{supervisor,
                                start_link,
                                [{local,sasl_safe_sup},sasl,safe]}},
                       {restart_type,permanent},
                       {shutdown,infinity},
                       {child_type,supervisor}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.48.0>},
                       {name,release_handler},
                       {mfa,{release_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
         application: sasl
          started_at: n2@cbe1066
base_server starting.

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,base_super}
             started: [{pid,<0.54.0>},
                       {name,server},
                       {mfa,{base_server,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
         application: base
          started_at: n2@cbe1066
dist_app:start({takeover,n1@cbe1066}, _)
dist_server starting.

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,dist_super}
             started: [{pid,<0.59.0>},
                       {name,server},
                       {mfa,{dist_server,
                                start_link,
                                [#Fun,
                                 #Fun]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]
dist_app:start_phase(takeover, {takeover,n1@cbe1066}, _)
dist_app:start_phase(go, _)

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
         application: dist
          started_at: n2@cbe1066

(n2@cbe1066)1> 
(n2@cbe1066)1> global:whereis_name(dist_server).
<0.59.0>
(n2@cbe1066)2> dist_server:get_value().
17
    

In the first node, n1, we can see the following output:

=INFO REPORT==== 5-Dec-2002::17:27:15 ===
    application: dist
    exited: stopped
    type: permanent
    

We can see that dist_server brought the state variable along when migrating to the other node (it did not bring the special function objects along, in order to avoid nasty surprises.)

We can now try different combinations of starting and killing the two nodes.


Ulf Wiger
Last modified: Thu Dec 5 18:08:45 MET 2002