Fault-tolerance and distributed system

Scott Lystig Fritchie <>
Sat Nov 30 19:57:23 CET 2002

>>>>> "hs" == Hal Snyder <> writes:

hs> Scott's example shows application but not supervision.

Wow, I forgot that I wrote that.  Nowadays, I simply use old .rel and
.app files from earlier projects as templates for new projects.
Mailing list archives are a wonderful thing.

I've put a dumb, brute-force, but functional supervisor example at
http://www.snookles.com/erlang/misc/supervisor_example.tar.gz.  After
extracting the source:

1. Change directory to the "src" subdirectory.

2. Run GNU make (or compatible): "make" or "gmake' or however it's
installed on your system.

3. Run "erl -pz ../ebin -boot foo" to start the "foo" application.

4. At the Erlang prompt, execute "appmon:start()."

   The "foo" application has a single supervisor the monitors two
   worker processes, both of which are generic servers that implement
   simple integer counters.  Each worker also spawn_links three
   processes in order to make the appmon process tree look more

   Note that there are a bunch of io:format() debugging messages that
   (hopefully) demonstrate how arguments are passed from start
   functions to init functions.

5. Use commands like "increment:get1(counter1)." and
   "increment:get_many(counter2, 100000)" to communicate with the two
   counter servers.

6. Run "appmon:start()." to start the application monitor
   application.  A GUI window will pop up.

   Click on the "foo" button.  To demonstrate that the supervisors
   are working correctly, click the "Kill" button and then the
   "counter1" or "counter2" boxes.  If either dies, the supervisor
   will kill all children and then restart them.  Note that the PIDs
   of the counters' "children" change and that the SASL application
   will spit out some messages to the console.

   Killing one of the unnamed "child" processes will kill the "parent"
   counter process because the counter process is not trapping process
   exits ... which will then cause the foo_example_sup supervisor to

   One of the three "child" processes will exit after 15 seconds, but
   because it exits with a 'normal' status, its parent is not killed.

   Killing foo_example_sup will result in shutting down the
   application because there is nobody to restart it.

Now, I can wait for Lennart or other OTP guru to criticise my example!


More information about the erlang-questions mailing list