[erlang-questions] (noob-help) Supervision strategies to automatically restart dynamically added children

Tue Mar 8 07:52:29 CET 2011

On Mon, Mar 7, 2011 at 5:08 AM, Edmond Begumisa
<ebegumisa@REDACTED> wrote:
> Hi Dhananjay,
>
> I too struggled with this exact question for quite some time so I'll chime
> in here on the two techniques I used to solve it...
> On Thu, 03 Mar 2011 05:02:06 +1100, Dhananjay Nene
> <dhananjay.nene@REDACTED> wrote:
>
>>
>> Question in short : If I have a supervisor which has a number of
>> dynamic children, how do I set up a mechanism where in case of a
>> complete system crash, all the dynamic children restart at the point
>> they were when the system (including the supervisor) crashed.
>>
>> Question in long :
>> =============
>>
>> Sample Context : A bowling game
>> -------------------------------------------------
>>
>> Lets say I am writing the software to implement the software necessary
>> to track various games at a bowling alley. I've set up the following
>> processes :
>>
>> a. Lanes : If there are 10 lanes, there are 10 processes, one for each
>> lane. These stay fixed for the entire duration of the program
>> b. Games : A group of players might get together to start a game on a
>> free lane. A new game will get created to track the game through its
>> completion. When the game is over, this process shall terminate
>> c. Players : Each game has a number of players. One process
>> "player_game" is started per player. Sample state of a player game
>> would include current score for the player and if the last two rolls
>> were strike or a spare. For the purpose of brevity, the remainder of
>> this mail only refers to this process and ignores the others
>>
>
> You could reduce complexity by having each lane process maintain it's
> current game (players and scores) as part of it's state. The game and
> player_game processes appear unnecessarily confusing to me.
>

Interesting point. The lanes are the only static aspects of the game.
I tried to consider whether it would make any difference from a client
API perspective, but I imagine for a client, there is no particular
reason to believe a lane is a better or worse abstraction than a game
(or a player_game).

>> Objective :
>> ---------------
>>
>> Assuming this is a single node implementation, if the machine were to
>> crash, upon machine / node restart, all the player_games should be
>> restarted and should be at the point where the player_games were when
>> the machine crashed.
>>
>> Possible supervision strategy :
>> --------------------------------------
>>
>> 1. Create a simple_one_for_one supervisor player_game_sup which upon
>> starting up for the first time would have no children associated with
>> them. Use supervisor:start_child to start each process
>> 2. The supervisor creates an entry in a database (say mnesia) every
>> time it launches a new process
>> 3. Each player_game updates the entry every time the score gets
>> modified. Upon termination that entry gets deleted
>> 4. Post crash, the supervisor is started again (say after an
>> application restart or via another supervisor)
>> 5. (Here's the difference). By default the supervisor will not restart
>> the dynamically added children (all the player_games). However we
>> modify the init code to inspect the database and launch a player_game
>> for each record it finds.
>
> How? I don't think you can instruct a simple_one_for_one supervisor to
> create children from it's init/1 callback. From the documentation...
>
> http://www.erlang.org/doc/man/supervisor.html#Module:init-1
>
> "...No child process is then started during the initialization phase, but
> all children are assumed to be started dynamically using
> supervisor:start_child/2..."

Fair point. Wasn't something that struck me as an issue then, but yes,
supervisor starting dynamic children inside init doesn't quite rock.

> AFIAK, creating dynamic children (calling supervisor:start_child/2) has to
> be done after the supervisor has initialised by a process other than the
> supervisor process.

Certainly. And your separate modeling of a lane_ldr (later down this
mail) helps that.

> This is normally not a problem if you are calling start_child/2 during the
> "normal" operation of the application because the supervisor in question is
> likely to already be up. But here, you want to call start_child/2 at
> *startup*. From my experience with this precise matter, this requires some
> process coordination.
>
>> The player_game initialises itself to the
>> current state as in the database and the game(s) can continue where
>> it/they left off.
>>
>> My questions :
>> --------------------
>> a. Does it make sense to move the responsibility to the supervisor to
>> update the database each time a new player game is started or
>> completed ?
>
> I personally don't see the advantage of doing this. Besides (as per my
> understanding of OTP design principles), a supervisor's job should be just
> that -- supervising workers and not doing work itself.
>
> Doing this from the your worker gen_servers make more sense to me and seems
> more natural. i.e Reading the scores from the DB the during player_game:init
> and writing them every time a score gets bumped or something similar.
>

I agree

> Possible supervision strategy 2a: (Loader version)
> --------------------------------------------------
>
> Rather than separate dynamic children for players and games as in Strategy
> 1, instead, each lane stores, as part of it's state, info on the current
> game (the players playing on the lane and their state/scores). The
> supervision tree might look like this...
>
>           alley_sup
>          /         \
>  lane_ldr  ___lanes_sup_____
>           /       |     :   \
>        lane(1)  lane(2) .. lane(N)
>
> * Application has a startup configuration parameter no_of_lanes which comes
> from a conf file or the .app file and loaded by the alley_sup...
>

This is a suggestion thats really had me thinking. I suspect there's a
bit of the traditional OO modeling experience which is grumbling about
not being able to model a game or a player game. I guess thats a
matter of learning / unlearning / getting used to.

> * lane_sup is a simple_one_for_one supervisor of any number of lanes but
> initially has none.
> * Now here is the trick: lane_ldr is a gen_server is initialised with
> No_Of_Lanes. It's job is to call supervisor:start_child No_Of_Lanes times at
> startup then vanish...

Cool.

> * Whenever a lane is started by the sup, it loads the most recent game from
> the DB, or just a simple text file (lane_1.game_state, lane_2.game_state,
> etc -- not a big deal if a text file gets corrupted and a game is lost so a
> DB might be overkill).
> * Now whenever the score gets bumped, or a new game is starts, or a game is
> concluded, the lane process writes the game state to your DB, or text file.
> For the simple text file, you could just keep calling...
>
> write_game_state(Path, Game_State) ->
>    ok = file:write_file(Path, io_lib:format("~p.", [Game_State])).

yes, that was one the options I had in mind

> Possible supervision strategy 2b: (Start Phase version)
> -------------------------------------------------------
>
> I was tipped-off by Ulf Wiger on this thread...
>
> http://thread.gmane.org/gmane.comp.lang.erlang.general/48307/focus=48324
>
> ... that the initailsiation/coordination done by lane_ldr in 2a above is
> precisely what the start phases feature of included applications is for!
> This requires splitting the application into two, but could be make things
> more manageable for larger applications. So one could get rid of lane_ldr
> and modify 2a to get something like...
>
>           alley_sup
>               |
>  bowling_app  |
> - - - - - - - -|- - - - - - - -
>  lanes_app    |
>               |
>       ___lanes_sup_____
>      /       |     :   \
>  lane(1)  lane(2) .. lane(N)
>
> * Split everything into two apps: the primary bowling_app and the included
> lanes_app.
> * The primary application would be pretty bare, and would start lanes_sup as
> if it were one of it's own modules...

Again a very interesting suggestion. Thanks. I'll certainly look into
it (too hard to comment on it yet, since I'm still grokk'ing it).

Once again, thanks a ton for this and the subsequent mails. They've
certainly help me think more, and think much harder :)

Dhananjay
-- 
-----------------------------------------------------------------------------------
http://blog.dhananjaynene.com twitter: @dnene