[erlang-questions] (noob-help) Supervision strategies to automatically restart dynamically added children
Edmond Begumisa
ebegumisa@REDACTED
Tue Mar 8 02:58:05 CET 2011
A third option...
Strategy 2c
------------
I've found 2a and 2b useful when you want to use a simple_one_for_one sup,
but need to sometimes autostart some of it's children at startup based on
some persisted criteria as per your specific question.
But in the case of eliminating player_game and game processes and having
only lanes (which I used as an example in 2a and 2b): The lanes are always
a fixed number from startup, so you could use a one_for_one lanes_sup with
a child-spec list, and have that at the top-level eliminating dynamic
children altogether.
____lanes_sup____
/ | : \
lane(1) lane(2) ... lane(n)
=== lanes_sup.erl ===
-behaviour(supervisor).
..
init([]) ->
{ok, No_Of_Lanes} = application:get_env(no_of_lanes),
ChildSpecs = [{Id, {lane,
start, []},
permanent,
10000,
worker,
[lane]}
|| Id <- lists:seq(1,No_Of_Lanes)],
{ok, {{one_for_one, 1, 30}, ChildSpecs}}.
=== lanes.erl ===
Same as 2a
Now the supervisor will start the children instead of you having to do it
via supervisor:start_child/2. No more need for a loader or start phases.
- Edmond -
On Mon, 07 Mar 2011 10:38:09 +1100, Edmond Begumisa
<ebegumisa@REDACTED> wrote:
> Hi Dhananjay,
>
> I too struggled with this exact question for quite some time so I'll
> chime in here on the two techniques I used to solve it...
>
> On Thu, 03 Mar 2011 05:02:06 +1100, Dhananjay Nene
> <dhananjay.nene@REDACTED> wrote:
>
>> While supervisors are meant to automatically restart failed processes,
>> there is one scenario I am as yet unable figure out which is the
>> idiomatic approach to implement crash recovery under the default OTP
>> scenarios. I have considered a solution, but being a relative newbie,
>> I am not sure if it is idiomatic erlang and if there are better
>> solutions.
>>
>> Question in short : If I have a supervisor which has a number of
>> dynamic children, how do I set up a mechanism where in case of a
>> complete system crash, all the dynamic children restart at the point
>> they were when the system (including the supervisor) crashed.
>>
>> Question in long :
>> =============
>>
>> Sample Context : A bowling game
>> -------------------------------------------------
>>
>> Lets say I am writing the software to implement the software necessary
>> to track various games at a bowling alley. I've set up the following
>> processes :
>>
>> a. Lanes : If there are 10 lanes, there are 10 processes, one for each
>> lane. These stay fixed for the entire duration of the program
>> b. Games : A group of players might get together to start a game on a
>> free lane. A new game will get created to track the game through its
>> completion. When the game is over, this process shall terminate
>> c. Players : Each game has a number of players. One process
>> "player_game" is started per player. Sample state of a player game
>> would include current score for the player and if the last two rolls
>> were strike or a spare. For the purpose of brevity, the remainder of
>> this mail only refers to this process and ignores the others
>>
>
> You could reduce complexity by having each lane process maintain it's
> current game (players and scores) as part of it's state. The game and
> player_game processes appear unnecessarily confusing to me.
>
>> Objective :
>> ---------------
>>
>> Assuming this is a single node implementation, if the machine were to
>> crash, upon machine / node restart, all the player_games should be
>> restarted and should be at the point where the player_games were when
>> the machine crashed.
>>
>> Possible supervision strategy :
>> --------------------------------------
>>
>> 1. Create a simple_one_for_one supervisor player_game_sup which upon
>> starting up for the first time would have no children associated with
>> them. Use supervisor:start_child to start each process
>> 2. The supervisor creates an entry in a database (say mnesia) every
>> time it launches a new process
>> 3. Each player_game updates the entry every time the score gets
>> modified. Upon termination that entry gets deleted
>> 4. Post crash, the supervisor is started again (say after an
>> application restart or via another supervisor)
>> 5. (Here's the difference). By default the supervisor will not restart
>> the dynamically added children (all the player_games). However we
>> modify the init code to inspect the database and launch a player_game
>> for each record it finds.
>
> How? I don't think you can instruct a simple_one_for_one supervisor to
> create children from it's init/1 callback. From the documentation...
>
> http://www.erlang.org/doc/man/supervisor.html#Module:init-1
>
> "...No child process is then started during the initialization phase,
> but all children are assumed to be started dynamically using
> supervisor:start_child/2..."
>
> Even if you switched to one_for_one with no child specs, I don't think
> you'd be able to call supervisor:start_child/2 from init/1 of the same
> supervisor since this function is called before the supervisor has
> finished initialising itself and it's the actual supervisor process
> doing the calling. You're likely to wait forever.
>
> AFIAK, creating dynamic children (calling supervisor:start_child/2) has
> to be done after the supervisor has initialised by a process other than
> the supervisor process.
>
> This is normally not a problem if you are calling start_child/2 during
> the "normal" operation of the application because the supervisor in
> question is likely to already be up. But here, you want to call
> start_child/2 at *startup*. From my experience with this precise matter,
> this requires some process coordination.
>
>> The player_game initialises itself to the
>> current state as in the database and the game(s) can continue where
>> it/they left off.
>>
>> My questions :
>> --------------------
>> a. Does it make sense to move the responsibility to the supervisor to
>> update the database each time a new player game is started or
>> completed ?
>
> I personally don't see the advantage of doing this. Besides (as per my
> understanding of OTP design principles), a supervisor's job should be
> just that -- supervising workers and not doing work itself.
>
> Doing this from the your worker gen_servers make more sense to me and
> seems more natural. i.e Reading the scores from the DB the during
> player_game:init and writing them every time a score gets bumped or
> something similar.
>
>> b. Is it an idiomatic way to implement crash recovery
>
> There is none. It's very application specific as Jesper has indicated.
>
> I've come across a couple of wide patterns, but the details of where to
> put checkpoints can't be generalised. For instance; although you are
> specifically asking about a single node, multi-node hot take-over with
> no DB/persistence is another way. I was recently privy to a very
> interesting discussion on that technique. You might want to check it out
> for a future project...
>
> http://thread.gmane.org/gmane.comp.lang.erlang.general/50258/focus=50269
>
>> c. Are there any other perhaps superior ways of implementing this?
>>
>
> I don't know about superior, I just don't think your first suggestion
> will actually work. I can offer of 2 possibilities each of which I've
> used...
>
> Possible supervision strategy 2a: (Loader version)
> --------------------------------------------------
>
> Rather than separate dynamic children for players and games as in
> Strategy 1, instead, each lane stores, as part of it's state, info on
> the current game (the players playing on the lane and their
> state/scores). The supervision tree might look like this...
>
> alley_sup
> / \
> lane_ldr ___lanes_sup_____
> / | : \
> lane(1) lane(2) .. lane(N)
>
> * Application has a startup configuration parameter no_of_lanes which
> comes from a conf file or the .app file and loaded by the alley_sup...
>
> === bowling_app.app ===
> {application, bowling_app,
> [{..
> {env,[{no_of_lanes,10}]},
> ..}]}.
>
> === alley_sup.erl ===
> -behaviour(supervisor).
> ..
> init([]) ->
> {ok, No_Of_Lanes} = application:get_env(no_of_lanes),
> {ok, {{one_for_one, 1, 30},
> [{lanes_sup,
> {lanes_sup, start, []},
> permanent,
> infinity,
> supervisor,
> [lanes_sup]},
> {lanes_ldr,
> {lanes_ldr, start, [No_Of_Lanes]},
> temporary, % Starts lanes_sup children then disappears
> 6000,
> worker,
> [lanes_ldr]}]}}.
>
> * lane_sup is a simple_one_for_one supervisor of any number of lanes but
> initially has none.
> * Now here is the trick: lane_ldr is a gen_server is initialised with
> No_Of_Lanes. It's job is to call supervisor:start_child No_Of_Lanes
> times at startup then vanish...
>
> === lane_ldr ===
> -behaviour(gen_server).
> ..
> init(No_Of_Lanes) when No_Of_Lanes >= 1 ->
> case start_lanes(No_Of_Lanes, 0) of
> No_Of_Lanes ->
> io:format("All lanes failed to init -- quitting
> application.~n"),
> {stop, all_lanes_failed}; % Cause alley_sup to quit
> abnormally
> _ ->
> io:format("Lane loader exiting.~n"),
> ignore % One or more lanes init'ed; loader's work is done.
> end.
>
> start_lanes(0, E) ->
> E; % Return no. of lanes that have failed to init
> start_lanes(N, E) ->
> case supervisor:start_child(lanes_sup, [N]) of
> {ok, _} ->
> io:format("Started lane ~w.~n", [N]),
> start_lanes(N - 1, E);
> Err ->
> io:format("Error starting lane ~w: ~p.~n", [N, Err]),
> start_lanes(N - 1, E + 1)
> end.
>
> %%% These are just placeholders for compiler warnings/dialyzer
>
> handle_call(void, _, void) ->
> {noreply, void}.
>
> handle_cast(void, void) ->
> {noreply, void}.
>
> handle_info(void, void) ->
> {noreply, void}.
>
> terminate(_, _) ->
> ignore.
>
> code_change(_, void, _) ->
> {ok, void}.
>
> * Whenever a lane is started by the sup, it loads the most recent game
> from the DB, or just a simple text file (lane_1.game_state,
> lane_2.game_state, etc -- not a big deal if a text file gets corrupted
> and a game is lost so a DB might be overkill). Possibly something along
> the lines of...
>
> === lane.erl ===
> -behaviour(gen_server).
> ..
> -record(player_state, {frame = 0, % NB: Removed player_name
> shot = 1,
> bonus_shot = false,
> last_shot = normal,
> prior_to_last_shot = normal,
> max_pins = 10,
> score = 0}).
>
> start(Id) ->
> gen_server:start_link(?MODULE, Id, []).
>
> init(Id) ->
> process_flag(trap_exit, true),
> Path = filename:join(code:priv_dir(bowling_app),
> "lane_" ++ integer_to_list(Id) ++
> ".game_state"),
> % Game State is a proplist of player_state records with players'
> name as key
> % [{Player_Name1, #player_state{}}, {Player_Name2,
> #player_state{}}, .. ]
> {ok, Game_State} = try read_game_state(Path)
> catch
> _:{badmatch, {error, enoent}} -> % File not
> found
> {file:write_file(Path, "[]."), []};
> _:Err -> % Discard
> bad state
> io:format("Zeroing corrupt game file
> ~s: ~p~n.",
> [Path, Err]),
> {file:write_file(Path, "[]."), []}
> end,
> {ok, {Game_State, Path, ..maybe some non-persisted state..}}.
>
> %% Assert the happy-case for good game state when reloading it
> read_game_state(Path) ->
> {ok, [Game_State]} = file:consult(Path),
> true = is_list(Game_State),
> lists:foreach(fun({Player_Name, Player_State}) ->
> true = is_list(Player_Name),
> true = is_record(Player_State, player_state),
> % Maybe do some other checks
> ok
> end, Game_State),
> {ok, Game_State}.
> ..
>
> NB: You'd probably use error_logger instead of all the io:formats.
>
> * Now whenever the score gets bumped, or a new game is starts, or a game
> is concluded, the lane process writes the game state to your DB, or text
> file. For the simple text file, you could just keep calling...
>
> write_game_state(Path, Game_State) ->
> ok = file:write_file(Path, io_lib:format("~p.", [Game_State])).
>
> Possible supervision strategy 2b: (Start Phase version)
> -------------------------------------------------------
>
> I was tipped-off by Ulf Wiger on this thread...
>
> http://thread.gmane.org/gmane.comp.lang.erlang.general/48307/focus=48324
>
> ... that the initailsiation/coordination done by lane_ldr in 2a above is
> precisely what the start phases feature of included applications is for!
> This requires splitting the application into two, but could be make
> things more manageable for larger applications. So one could get rid of
> lane_ldr and modify 2a to get something like...
>
> alley_sup
> |
> bowling_app |
> - - - - - - - -|- - - - - - - -
> lanes_app |
> |
> ___lanes_sup_____
> / | : \
> lane(1) lane(2) .. lane(N)
>
> * Split everything into two apps: the primary bowling_app and the
> included lanes_app.
> * The primary application would be pretty bare, and would start
> lanes_sup as if it were one of it's own modules...
>
> === bowling_app.app ===
> {application, bowling_app,
> [..
> {mod, {application_starter,[bowling_app,[]]}},
> {included_applications, [lanes_app]},
> {start_phases, [{init,[]}, {go,[]}]}
> ..
> ]}.
>
> === bowling_app.erl ===
> -behaviour(application).
> ..
> %% Called on application:start
> start(normal, StartArgs) ->
> alley_sup:start(StartArgs).
>
> %% Called *after* entire sup tree is initialised
> start_phase(init, normal, []) ->
> % If there's a DB, initialise it here
> ok;
> start_phase(go, normal, []) ->
> ok.
> ..
>
> === alley_sup.erl ===
> -behaviour(supervisor).
> ..
> init([]) ->
> {ok, {{one_for_one, 1, 30},
> [{lanes_sup,
> {lanes_sup, start, []},
> permanent,
> infinity,
> supervisor,
> [lanes_sup]}]}}. % Mod of included app.
>
> * Nothing else is needed in the primary app.
> * The second application will be responsible for spawning the dynamic
> children on startup...
>
> === lanes_app.app ===
> {application, lanes_app,
> [..
> {env,[{no_of_lanes,10}]},
> {mod,{lanes,[]}},
> {start_phases, [{init,[]}, {go,[]}]}
> ..
> ]}.
>
> === lanes_app.erl ===
> -behaviour(application).
> ..
> %% NOT called
> start(normal, StartArgs) ->
> lanes_sup:start(StartArgs).
>
> %% Called *after* entire sup tree is initialised
> %% and corresponding bowling_app:start_phase
> start_phase(init, normal, []) ->
> ok;
> start_phase(go, normal, []) ->
> {ok, No_Of_Lanes} = application:get_env(?MODULE, no_of_lanes),
> true = No_Of_Lanes >= 1,
> case start_lanes(No_Of_Lanes, 0) of
> No_Of_Lanes ->
> io:format("All lanes failed to init -- quitting
> application.~n"),
> {error, all_lanes_failed}; % Cause app to quit abnormally
> _ ->
> ok % One or more lanes init'ed, continue.
> end.
>
> start_lanes(0, E) ->
> E; % Return no. of lanes that have failed to init
> start_lanes(N, E) ->
> case supervisor:start_child(lanes_sup, [N]) of
> {ok, _} ->
> io:format("Started lane ~w.~n", [N]),
> start_lanes(N - 1, E);
> Err ->
> io:format("Error starting lane ~w: ~p.~n", [N, Err]),
> start_lanes(N - 1, E + 1)
> end.
>
> === lanes_sup.erl ===
> Same as in Strategy 2a
>
> === lane.erl ===
> Same as in Strategy 2a
>
> Strategy 2b is cleaner to me than Strategy 2a, even though it requires
> splitting an application into two which many people seem to have a
> problem with.
>
> - Edmond -
>
>
>> FWIW : the code I am using to learn erlang is at
>> https://github.com/dnene/bowling . Its not particularly interesting at
>> this stage since it is still under development.
>>
>> Thanks
>> Dhananjay
>>
>> PS: Apologies for posting it to erlang-questions after earlier posting
>> it to erlang programming google group. Those monitoring the latter
>> will receive this question twice.
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>
>
>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
More information about the erlang-questions
mailing list