(noob-help) Supervision strategies to automatically restart dynamically added children

Wed Mar 2 19:02:06 CET 2011

While supervisors are meant to automatically restart failed processes,
there is one scenario I am as yet unable figure out which is the
idiomatic approach to implement crash recovery under the default OTP
scenarios. I have considered a solution, but being a relative newbie,
I am not sure if it is idiomatic erlang and if there are better
solutions.

Question in short : If I have a supervisor which has a number of
dynamic children, how do I set up a mechanism where in case of a
complete system crash, all the dynamic children restart at the point
they were when the system (including the supervisor) crashed.

Question in long :
=============

Sample Context : A bowling game
-------------------------------------------------

Lets say I am writing the software to implement the software necessary
to track various games at a bowling alley. I've set up the following
processes :

a. Lanes : If there are 10 lanes, there are 10 processes, one for each
lane. These stay fixed for the entire duration of the program
b. Games : A group of players might get together to start a game on a
free lane. A new game will get created to track the game through its
completion. When the game is over, this process shall terminate
c. Players : Each game has a number of players. One process
"player_game" is started per player. Sample state of a player game
would include current score for the player and if the last two rolls
were strike or a spare. For the purpose of brevity, the remainder of
this mail only refers to this process and ignores the others

Objective :
---------------

Assuming this is a single node implementation, if the machine were to
crash, upon machine / node restart, all the player_games should be
restarted and should be at the point where the player_games were when
the machine crashed.

Possible supervision strategy :
--------------------------------------

1. Create a simple_one_for_one supervisor player_game_sup which upon
starting up for the first time would have no children associated with
them. Use supervisor:start_child to start each process
2. The supervisor creates an entry in a database (say mnesia) every
time it launches a new process
3. Each player_game updates the entry every time the score gets
modified. Upon termination that entry gets deleted
4. Post crash, the supervisor is started again (say after an
application restart or via another supervisor)
5. (Here's the difference). By default the supervisor will not restart
the dynamically added children (all the player_games). However we
modify the init code to inspect the database and launch a player_game
for each record it finds. The player_game initialises itself to the
current state as in the database and the game(s) can continue where
it/they left off.

My questions :
--------------------
a. Does it make sense to move the responsibility to the supervisor to
update the database each time a new player game is started or
completed ?
b. Is it an idiomatic way to implement crash recovery
c. Are there any other perhaps superior ways of implementing this?

FWIW : the code I am using to learn erlang is at
https://github.com/dnene/bowling . Its not particularly interesting at
this stage since it is still under development.

Thanks
Dhananjay

PS: Apologies for posting it to erlang-questions after earlier posting
it to erlang programming google group. Those monitoring the latter
will receive this question twice.