[erlang-questions] (noob-help) Supervision strategies to automatically restart dynamically added children

Tue Mar 8 02:58:05 CET 2011

A third option...

Strategy 2c
------------

I've found 2a and 2b useful when you want to use a simple_one_for_one sup,  
but need to sometimes autostart some of it's children at startup based on  
some persisted criteria as per your specific question.

But in the case of eliminating player_game and game processes and having  
only lanes (which I used as an example in 2a and 2b): The lanes are always  
a fixed number from startup, so you could use a one_for_one lanes_sup with  
a child-spec list, and have that at the top-level eliminating dynamic  
children altogether.

       ____lanes_sup____
      /       |     :   \
  lane(1)  lane(2) ... lane(n)

=== lanes_sup.erl ===
-behaviour(supervisor).
..
init([]) ->
      {ok, No_Of_Lanes} = application:get_env(no_of_lanes),
      ChildSpecs = [{Id, {lane,
                          start, []},
                          permanent,
                          10000,
                          worker,
                          [lane]}
                     || Id <- lists:seq(1,No_Of_Lanes)],
      {ok, {{one_for_one, 1, 30}, ChildSpecs}}.

=== lanes.erl ===
Same as 2a

Now the supervisor will start the children instead of you having to do it  
via supervisor:start_child/2. No more need for a loader or start phases.

- Edmond -

On Mon, 07 Mar 2011 10:38:09 +1100, Edmond Begumisa  
<ebegumisa@REDACTED> wrote:

> Hi Dhananjay,
>
> I too struggled with this exact question for quite some time so I'll  
> chime in here on the two techniques I used to solve it...
>
> On Thu, 03 Mar 2011 05:02:06 +1100, Dhananjay Nene  
> <dhananjay.nene@REDACTED> wrote:
>
>> While supervisors are meant to automatically restart failed processes,
>> there is one scenario I am as yet unable figure out which is the
>> idiomatic approach to implement crash recovery under the default OTP
>> scenarios. I have considered a solution, but being a relative newbie,
>> I am not sure if it is idiomatic erlang and if there are better
>> solutions.
>>
>> Question in short : If I have a supervisor which has a number of
>> dynamic children, how do I set up a mechanism where in case of a
>> complete system crash, all the dynamic children restart at the point
>> they were when the system (including the supervisor) crashed.
>>
>> Question in long :
>> =============
>>
>> Sample Context : A bowling game
>> -------------------------------------------------
>>
>> Lets say I am writing the software to implement the software necessary
>> to track various games at a bowling alley. I've set up the following
>> processes :
>>
>> a. Lanes : If there are 10 lanes, there are 10 processes, one for each
>> lane. These stay fixed for the entire duration of the program
>> b. Games : A group of players might get together to start a game on a
>> free lane. A new game will get created to track the game through its
>> completion. When the game is over, this process shall terminate
>> c. Players : Each game has a number of players. One process
>> "player_game" is started per player. Sample state of a player game
>> would include current score for the player and if the last two rolls
>> were strike or a spare. For the purpose of brevity, the remainder of
>> this mail only refers to this process and ignores the others
>>
>
> You could reduce complexity by having each lane process maintain it's  
> current game (players and scores) as part of it's state. The game and  
> player_game processes appear unnecessarily confusing to me.
>
>> Objective :
>> ---------------
>>
>> Assuming this is a single node implementation, if the machine were to
>> crash, upon machine / node restart, all the player_games should be
>> restarted and should be at the point where the player_games were when
>> the machine crashed.
>>
>> Possible supervision strategy :
>> --------------------------------------
>>
>> 1. Create a simple_one_for_one supervisor player_game_sup which upon
>> starting up for the first time would have no children associated with
>> them. Use supervisor:start_child to start each process
>> 2. The supervisor creates an entry in a database (say mnesia) every
>> time it launches a new process
>> 3. Each player_game updates the entry every time the score gets
>> modified. Upon termination that entry gets deleted
>> 4. Post crash, the supervisor is started again (say after an
>> application restart or via another supervisor)
>> 5. (Here's the difference). By default the supervisor will not restart
>> the dynamically added children (all the player_games). However we
>> modify the init code to inspect the database and launch a player_game
>> for each record it finds.
>
> How? I don't think you can instruct a simple_one_for_one supervisor to  
> create children from it's init/1 callback. From the documentation...
>
> http://www.erlang.org/doc/man/supervisor.html#Module:init-1
>
> "...No child process is then started during the initialization phase,  
> but all children are assumed to be started dynamically using  
> supervisor:start_child/2..."
>
> Even if you switched to one_for_one with no child specs, I don't think  
> you'd be able to call supervisor:start_child/2 from init/1 of the same  
> supervisor since this function is called before the supervisor has  
> finished initialising itself and it's the actual supervisor process  
> doing the calling. You're likely to wait forever.
>
> AFIAK, creating dynamic children (calling supervisor:start_child/2) has  
> to be done after the supervisor has initialised by a process other than  
> the supervisor process.
>
> This is normally not a problem if you are calling start_child/2 during  
> the "normal" operation of the application because the supervisor in  
> question is likely to already be up. But here, you want to call  
> start_child/2 at *startup*. From my experience with this precise matter,  
> this requires some process coordination.
>
>> The player_game initialises itself to the
>> current state as in the database and the game(s) can continue where
>> it/they left off.
>>
>> My questions :
>> --------------------
>> a. Does it make sense to move the responsibility to the supervisor to
>> update the database each time a new player game is started or
>> completed ?
>
> I personally don't see the advantage of doing this. Besides (as per my  
> understanding of OTP design principles), a supervisor's job should be  
> just that -- supervising workers and not doing work itself.
>
> Doing this from the your worker gen_servers make more sense to me and  
> seems more natural. i.e Reading the scores from the DB the during  
> player_game:init and writing them every time a score gets bumped or  
> something similar.
>
>> b. Is it an idiomatic way to implement crash recovery
>
> There is none. It's very application specific as Jesper has indicated.
>
> I've come across a couple of wide patterns, but the details of where to  
> put checkpoints can't be generalised. For instance; although you are  
> specifically asking about a single node, multi-node hot take-over with  
> no DB/persistence is another way. I was recently privy to a very  
> interesting discussion on that technique. You might want to check it out  
> for a future project...
>
> http://thread.gmane.org/gmane.comp.lang.erlang.general/50258/focus=50269
>
>> c. Are there any other perhaps superior ways of implementing this?
>>
>
> I don't know about superior, I just don't think your first suggestion  
> will actually work. I can offer of 2 possibilities each of which I've  
> used...
>
> Possible supervision strategy 2a: (Loader version)
> --------------------------------------------------
>
> Rather than separate dynamic children for players and games as in  
> Strategy 1, instead, each lane stores, as part of it's state, info on  
> the current game (the players playing on the lane and their  
> state/scores). The supervision tree might look like this...
>
>             alley_sup
>            /         \
>    lane_ldr  ___lanes_sup_____
>             /       |     :   \
>          lane(1)  lane(2) .. lane(N)
>
> * Application has a startup configuration parameter no_of_lanes which  
> comes from a conf file or the .app file and loaded by the alley_sup...
>
> === bowling_app.app ===
> {application, bowling_app,
>   [{..
>     {env,[{no_of_lanes,10}]},
>     ..}]}.
>
> === alley_sup.erl ===
> -behaviour(supervisor).
> ..
> init([]) ->
>      {ok, No_Of_Lanes} = application:get_env(no_of_lanes),
>      {ok, {{one_for_one, 1, 30},
>         [{lanes_sup,
>              {lanes_sup, start, []},
>               permanent,
>               infinity,
>               supervisor,
>               [lanes_sup]},
>          {lanes_ldr,
>              {lanes_ldr, start, [No_Of_Lanes]},
>               temporary, % Starts lanes_sup children then disappears
>               6000,
>               worker,
>               [lanes_ldr]}]}}.
>
> * lane_sup is a simple_one_for_one supervisor of any number of lanes but  
> initially has none.
> * Now here is the trick: lane_ldr is a gen_server is initialised with  
> No_Of_Lanes. It's job is to call supervisor:start_child No_Of_Lanes  
> times at startup then vanish...
>
> === lane_ldr ===
> -behaviour(gen_server).
> ..
> init(No_Of_Lanes) when No_Of_Lanes >= 1 ->
>      case start_lanes(No_Of_Lanes, 0) of
>          No_Of_Lanes ->
>              io:format("All lanes failed to init -- quitting  
> application.~n"),
>              {stop, all_lanes_failed}; % Cause alley_sup to quit  
> abnormally
>          _ ->
>              io:format("Lane loader exiting.~n"),
>              ignore % One or more lanes init'ed; loader's work is done.
>      end.
>
> start_lanes(0, E) ->
>      E; % Return no. of lanes that have failed to init
> start_lanes(N, E) ->
>      case supervisor:start_child(lanes_sup, [N]) of
>          {ok, _} ->
>              io:format("Started lane ~w.~n", [N]),
>              start_lanes(N - 1, E);
>          Err ->
>              io:format("Error starting lane ~w: ~p.~n", [N, Err]),
>              start_lanes(N - 1, E + 1)
>      end.
>
> %%% These are just placeholders for compiler warnings/dialyzer
>
> handle_call(void, _, void) ->
>      {noreply, void}.
>
> handle_cast(void, void) ->
>      {noreply, void}.
>
> handle_info(void, void) ->
>      {noreply, void}.
>
> terminate(_, _) ->
>      ignore.
>
> code_change(_, void, _) ->
>      {ok, void}.
>
> * Whenever a lane is started by the sup, it loads the most recent game  
>  from the DB, or just a simple text file (lane_1.game_state,  
> lane_2.game_state, etc -- not a big deal if a text file gets corrupted  
> and a game is lost so a DB might be overkill). Possibly something along  
> the lines of...
>
> === lane.erl ===
> -behaviour(gen_server).
> ..
> -record(player_state, {frame = 0, % NB: Removed player_name
>                         shot = 1,
>                         bonus_shot = false,
>                         last_shot = normal,
>                         prior_to_last_shot = normal,
>                         max_pins = 10,
>                         score = 0}).
>
> start(Id) ->
>      gen_server:start_link(?MODULE, Id, []).
>
> init(Id) ->
>      process_flag(trap_exit, true),
>      Path = filename:join(code:priv_dir(bowling_app),
>                           "lane_" ++ integer_to_list(Id) ++  
> ".game_state"),
>      % Game State is a proplist of player_state records with players'  
> name as key
>      %    [{Player_Name1, #player_state{}}, {Player_Name2,  
> #player_state{}}, .. ]
>      {ok, Game_State} = try read_game_state(Path)
>                         catch
>                              _:{badmatch, {error, enoent}} -> % File not  
> found
>                                  {file:write_file(Path, "[]."), []};
>                              _:Err ->                         % Discard  
> bad state
>                                  io:format("Zeroing corrupt game file  
> ~s: ~p~n.",
>                                              [Path, Err]),
>                                  {file:write_file(Path, "[]."), []}
>                         end,
>      {ok, {Game_State, Path, ..maybe some non-persisted state..}}.
>
> %% Assert the happy-case for good game state when reloading it
> read_game_state(Path) ->
>      {ok, [Game_State]} = file:consult(Path),
>      true = is_list(Game_State),
>      lists:foreach(fun({Player_Name, Player_State}) ->
>                      true = is_list(Player_Name),
>                      true = is_record(Player_State, player_state),
>                      % Maybe do some other checks
>                      ok
>                    end, Game_State),
>      {ok, Game_State}.
> ..
>
> NB: You'd probably use error_logger instead of all the io:formats.
>
> * Now whenever the score gets bumped, or a new game is starts, or a game  
> is concluded, the lane process writes the game state to your DB, or text  
> file. For the simple text file, you could just keep calling...
>
> write_game_state(Path, Game_State) ->
>      ok = file:write_file(Path, io_lib:format("~p.", [Game_State])).
>
> Possible supervision strategy 2b: (Start Phase version)
> -------------------------------------------------------
>
> I was tipped-off by Ulf Wiger on this thread...
>
> http://thread.gmane.org/gmane.comp.lang.erlang.general/48307/focus=48324
>
> ... that the initailsiation/coordination done by lane_ldr in 2a above is  
> precisely what the start phases feature of included applications is for!  
> This requires splitting the application into two, but could be make  
> things more manageable for larger applications. So one could get rid of  
> lane_ldr and modify 2a to get something like...
>
>             alley_sup
>                 |
>    bowling_app  |
> - - - - - - - -|- - - - - - - -
>    lanes_app    |
>                 |
>         ___lanes_sup_____
>        /       |     :   \
>    lane(1)  lane(2) .. lane(N)
>
> * Split everything into two apps: the primary bowling_app and the  
> included lanes_app.
> * The primary application would be pretty bare, and would start  
> lanes_sup as if it were one of it's own modules...
>
> === bowling_app.app ===
> {application, bowling_app,
>   [..
>    {mod, {application_starter,[bowling_app,[]]}},
>    {included_applications, [lanes_app]},
>    {start_phases, [{init,[]}, {go,[]}]}
>    ..
>   ]}.
>
> === bowling_app.erl ===
> -behaviour(application).
> ..
> %% Called on application:start
> start(normal, StartArgs) ->
>      alley_sup:start(StartArgs).
>
> %% Called *after* entire sup tree is initialised
> start_phase(init, normal, []) ->
>      % If there's a DB, initialise it here
>      ok;
> start_phase(go, normal, []) ->
>      ok.
> ..
>
> === alley_sup.erl ===
> -behaviour(supervisor).
> ..
> init([]) ->
>      {ok, {{one_for_one, 1, 30},
>         [{lanes_sup,
>              {lanes_sup, start, []},
>               permanent,
>               infinity,
>               supervisor,
>               [lanes_sup]}]}}. % Mod of included app.
>
> * Nothing else is needed in the primary app.
> * The second application will be responsible for spawning the dynamic  
> children on startup...
>
> === lanes_app.app ===
> {application, lanes_app,
>   [..
>    {env,[{no_of_lanes,10}]},
>    {mod,{lanes,[]}},
>    {start_phases, [{init,[]}, {go,[]}]}
>    ..
>   ]}.
>
> === lanes_app.erl ===
> -behaviour(application).
> ..
> %% NOT called
> start(normal, StartArgs) ->
>      lanes_sup:start(StartArgs).
>
> %% Called *after* entire sup tree is initialised
> %% and corresponding bowling_app:start_phase
> start_phase(init, normal, []) ->
>      ok;
> start_phase(go, normal, []) ->
>      {ok, No_Of_Lanes} = application:get_env(?MODULE, no_of_lanes),
>      true = No_Of_Lanes >= 1,
>      case start_lanes(No_Of_Lanes, 0) of
>          No_Of_Lanes ->
>              io:format("All lanes failed to init -- quitting  
> application.~n"),
>              {error, all_lanes_failed}; % Cause app to quit abnormally
>          _ ->
>              ok % One or more lanes init'ed, continue.
>      end.
>
> start_lanes(0, E) ->
>      E; % Return no. of lanes that have failed to init
> start_lanes(N, E) ->
>      case supervisor:start_child(lanes_sup, [N]) of
>          {ok, _} ->
>              io:format("Started lane ~w.~n", [N]),
>              start_lanes(N - 1, E);
>          Err ->
>              io:format("Error starting lane ~w: ~p.~n", [N, Err]),
>              start_lanes(N - 1, E + 1)
>      end.
>
> === lanes_sup.erl ===
> Same as in Strategy 2a
>
> === lane.erl ===
> Same as in Strategy 2a
>
> Strategy 2b is cleaner to me than Strategy 2a, even though it requires  
> splitting an application into two which many people seem to have a  
> problem with.
>
> - Edmond -
>
>
>> FWIW : the code I am using to learn erlang is at
>> https://github.com/dnene/bowling . Its not particularly interesting at
>> this stage since it is still under development.
>>
>> Thanks
>> Dhananjay
>>
>> PS: Apologies for posting it to erlang-questions after earlier posting
>> it to erlang programming google group. Those monitoring the latter
>> will receive this question twice.
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>
>
>

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/