[erlang-questions] (noob-help) Supervision strategies to automatically restart dynamically added children
Edmond Begumisa
ebegumisa@REDACTED
Tue Mar 8 03:16:42 CET 2011
PS:
The disadvantage with 2c is that if lanes:init fails in one lane the
entire application will fail to start unlike 2a and 2b that is tolerant of
this.
This is why I personally prefer using a loader process or start-phases.
- Edmond -
On Tue, 08 Mar 2011 12:58:05 +1100, Edmond Begumisa
<ebegumisa@REDACTED> wrote:
> A third option...
>
> Strategy 2c
> ------------
>
> I've found 2a and 2b useful when you want to use a simple_one_for_one
> sup, but need to sometimes autostart some of it's children at startup
> based on some persisted criteria as per your specific question.
>
> But in the case of eliminating player_game and game processes and having
> only lanes (which I used as an example in 2a and 2b): The lanes are
> always a fixed number from startup, so you could use a one_for_one
> lanes_sup with a child-spec list, and have that at the top-level
> eliminating dynamic children altogether.
>
> ____lanes_sup____
> / | : \
> lane(1) lane(2) ... lane(n)
>
>
> === lanes_sup.erl ===
> -behaviour(supervisor).
> ..
> init([]) ->
> {ok, No_Of_Lanes} = application:get_env(no_of_lanes),
> ChildSpecs = [{Id, {lane,
> start, []},
> permanent,
> 10000,
> worker,
> [lane]}
> || Id <- lists:seq(1,No_Of_Lanes)],
> {ok, {{one_for_one, 1, 30}, ChildSpecs}}.
>
> === lanes.erl ===
> Same as 2a
>
> Now the supervisor will start the children instead of you having to do
> it via supervisor:start_child/2. No more need for a loader or start
> phases.
>
> - Edmond -
>
> On Mon, 07 Mar 2011 10:38:09 +1100, Edmond Begumisa
> <ebegumisa@REDACTED> wrote:
>
>> Hi Dhananjay,
>>
>> I too struggled with this exact question for quite some time so I'll
>> chime in here on the two techniques I used to solve it...
>>
>> On Thu, 03 Mar 2011 05:02:06 +1100, Dhananjay Nene
>> <dhananjay.nene@REDACTED> wrote:
>>
>>> While supervisors are meant to automatically restart failed processes,
>>> there is one scenario I am as yet unable figure out which is the
>>> idiomatic approach to implement crash recovery under the default OTP
>>> scenarios. I have considered a solution, but being a relative newbie,
>>> I am not sure if it is idiomatic erlang and if there are better
>>> solutions.
>>>
>>> Question in short : If I have a supervisor which has a number of
>>> dynamic children, how do I set up a mechanism where in case of a
>>> complete system crash, all the dynamic children restart at the point
>>> they were when the system (including the supervisor) crashed.
>>>
>>> Question in long :
>>> =============
>>>
>>> Sample Context : A bowling game
>>> -------------------------------------------------
>>>
>>> Lets say I am writing the software to implement the software necessary
>>> to track various games at a bowling alley. I've set up the following
>>> processes :
>>>
>>> a. Lanes : If there are 10 lanes, there are 10 processes, one for each
>>> lane. These stay fixed for the entire duration of the program
>>> b. Games : A group of players might get together to start a game on a
>>> free lane. A new game will get created to track the game through its
>>> completion. When the game is over, this process shall terminate
>>> c. Players : Each game has a number of players. One process
>>> "player_game" is started per player. Sample state of a player game
>>> would include current score for the player and if the last two rolls
>>> were strike or a spare. For the purpose of brevity, the remainder of
>>> this mail only refers to this process and ignores the others
>>>
>>
>> You could reduce complexity by having each lane process maintain it's
>> current game (players and scores) as part of it's state. The game and
>> player_game processes appear unnecessarily confusing to me.
>>
>>> Objective :
>>> ---------------
>>>
>>> Assuming this is a single node implementation, if the machine were to
>>> crash, upon machine / node restart, all the player_games should be
>>> restarted and should be at the point where the player_games were when
>>> the machine crashed.
>>>
>>> Possible supervision strategy :
>>> --------------------------------------
>>>
>>> 1. Create a simple_one_for_one supervisor player_game_sup which upon
>>> starting up for the first time would have no children associated with
>>> them. Use supervisor:start_child to start each process
>>> 2. The supervisor creates an entry in a database (say mnesia) every
>>> time it launches a new process
>>> 3. Each player_game updates the entry every time the score gets
>>> modified. Upon termination that entry gets deleted
>>> 4. Post crash, the supervisor is started again (say after an
>>> application restart or via another supervisor)
>>> 5. (Here's the difference). By default the supervisor will not restart
>>> the dynamically added children (all the player_games). However we
>>> modify the init code to inspect the database and launch a player_game
>>> for each record it finds.
>>
>> How? I don't think you can instruct a simple_one_for_one supervisor to
>> create children from it's init/1 callback. From the documentation...
>>
>> http://www.erlang.org/doc/man/supervisor.html#Module:init-1
>>
>> "...No child process is then started during the initialization phase,
>> but all children are assumed to be started dynamically using
>> supervisor:start_child/2..."
>>
>> Even if you switched to one_for_one with no child specs, I don't think
>> you'd be able to call supervisor:start_child/2 from init/1 of the same
>> supervisor since this function is called before the supervisor has
>> finished initialising itself and it's the actual supervisor process
>> doing the calling. You're likely to wait forever.
>>
>> AFIAK, creating dynamic children (calling supervisor:start_child/2) has
>> to be done after the supervisor has initialised by a process other than
>> the supervisor process.
>>
>> This is normally not a problem if you are calling start_child/2 during
>> the "normal" operation of the application because the supervisor in
>> question is likely to already be up. But here, you want to call
>> start_child/2 at *startup*. From my experience with this precise
>> matter, this requires some process coordination.
>>
>>> The player_game initialises itself to the
>>> current state as in the database and the game(s) can continue where
>>> it/they left off.
>>>
>>> My questions :
>>> --------------------
>>> a. Does it make sense to move the responsibility to the supervisor to
>>> update the database each time a new player game is started or
>>> completed ?
>>
>> I personally don't see the advantage of doing this. Besides (as per my
>> understanding of OTP design principles), a supervisor's job should be
>> just that -- supervising workers and not doing work itself.
>>
>> Doing this from the your worker gen_servers make more sense to me and
>> seems more natural. i.e Reading the scores from the DB the during
>> player_game:init and writing them every time a score gets bumped or
>> something similar.
>>
>>> b. Is it an idiomatic way to implement crash recovery
>>
>> There is none. It's very application specific as Jesper has indicated.
>>
>> I've come across a couple of wide patterns, but the details of where to
>> put checkpoints can't be generalised. For instance; although you are
>> specifically asking about a single node, multi-node hot take-over with
>> no DB/persistence is another way. I was recently privy to a very
>> interesting discussion on that technique. You might want to check it
>> out for a future project...
>>
>> http://thread.gmane.org/gmane.comp.lang.erlang.general/50258/focus=50269
>>
>>> c. Are there any other perhaps superior ways of implementing this?
>>>
>>
>> I don't know about superior, I just don't think your first suggestion
>> will actually work. I can offer of 2 possibilities each of which I've
>> used...
>>
>> Possible supervision strategy 2a: (Loader version)
>> --------------------------------------------------
>>
>> Rather than separate dynamic children for players and games as in
>> Strategy 1, instead, each lane stores, as part of it's state, info on
>> the current game (the players playing on the lane and their
>> state/scores). The supervision tree might look like this...
>>
>> alley_sup
>> / \
>> lane_ldr ___lanes_sup_____
>> / | : \
>> lane(1) lane(2) .. lane(N)
>>
>> * Application has a startup configuration parameter no_of_lanes which
>> comes from a conf file or the .app file and loaded by the alley_sup...
>>
>> === bowling_app.app ===
>> {application, bowling_app,
>> [{..
>> {env,[{no_of_lanes,10}]},
>> ..}]}.
>>
>> === alley_sup.erl ===
>> -behaviour(supervisor).
>> ..
>> init([]) ->
>> {ok, No_Of_Lanes} = application:get_env(no_of_lanes),
>> {ok, {{one_for_one, 1, 30},
>> [{lanes_sup,
>> {lanes_sup, start, []},
>> permanent,
>> infinity,
>> supervisor,
>> [lanes_sup]},
>> {lanes_ldr,
>> {lanes_ldr, start, [No_Of_Lanes]},
>> temporary, % Starts lanes_sup children then disappears
>> 6000,
>> worker,
>> [lanes_ldr]}]}}.
>>
>> * lane_sup is a simple_one_for_one supervisor of any number of lanes
>> but initially has none.
>> * Now here is the trick: lane_ldr is a gen_server is initialised with
>> No_Of_Lanes. It's job is to call supervisor:start_child No_Of_Lanes
>> times at startup then vanish...
>>
>> === lane_ldr ===
>> -behaviour(gen_server).
>> ..
>> init(No_Of_Lanes) when No_Of_Lanes >= 1 ->
>> case start_lanes(No_Of_Lanes, 0) of
>> No_Of_Lanes ->
>> io:format("All lanes failed to init -- quitting
>> application.~n"),
>> {stop, all_lanes_failed}; % Cause alley_sup to quit
>> abnormally
>> _ ->
>> io:format("Lane loader exiting.~n"),
>> ignore % One or more lanes init'ed; loader's work is done.
>> end.
>>
>> start_lanes(0, E) ->
>> E; % Return no. of lanes that have failed to init
>> start_lanes(N, E) ->
>> case supervisor:start_child(lanes_sup, [N]) of
>> {ok, _} ->
>> io:format("Started lane ~w.~n", [N]),
>> start_lanes(N - 1, E);
>> Err ->
>> io:format("Error starting lane ~w: ~p.~n", [N, Err]),
>> start_lanes(N - 1, E + 1)
>> end.
>>
>> %%% These are just placeholders for compiler warnings/dialyzer
>>
>> handle_call(void, _, void) ->
>> {noreply, void}.
>>
>> handle_cast(void, void) ->
>> {noreply, void}.
>>
>> handle_info(void, void) ->
>> {noreply, void}.
>>
>> terminate(_, _) ->
>> ignore.
>>
>> code_change(_, void, _) ->
>> {ok, void}.
>>
>> * Whenever a lane is started by the sup, it loads the most recent game
>> from the DB, or just a simple text file (lane_1.game_state,
>> lane_2.game_state, etc -- not a big deal if a text file gets corrupted
>> and a game is lost so a DB might be overkill). Possibly something along
>> the lines of...
>>
>> === lane.erl ===
>> -behaviour(gen_server).
>> ..
>> -record(player_state, {frame = 0, % NB: Removed player_name
>> shot = 1,
>> bonus_shot = false,
>> last_shot = normal,
>> prior_to_last_shot = normal,
>> max_pins = 10,
>> score = 0}).
>>
>> start(Id) ->
>> gen_server:start_link(?MODULE, Id, []).
>>
>> init(Id) ->
>> process_flag(trap_exit, true),
>> Path = filename:join(code:priv_dir(bowling_app),
>> "lane_" ++ integer_to_list(Id) ++
>> ".game_state"),
>> % Game State is a proplist of player_state records with players'
>> name as key
>> % [{Player_Name1, #player_state{}}, {Player_Name2,
>> #player_state{}}, .. ]
>> {ok, Game_State} = try read_game_state(Path)
>> catch
>> _:{badmatch, {error, enoent}} -> % File
>> not found
>> {file:write_file(Path, "[]."), []};
>> _:Err -> % Discard
>> bad state
>> io:format("Zeroing corrupt game file
>> ~s: ~p~n.",
>> [Path, Err]),
>> {file:write_file(Path, "[]."), []}
>> end,
>> {ok, {Game_State, Path, ..maybe some non-persisted state..}}.
>>
>> %% Assert the happy-case for good game state when reloading it
>> read_game_state(Path) ->
>> {ok, [Game_State]} = file:consult(Path),
>> true = is_list(Game_State),
>> lists:foreach(fun({Player_Name, Player_State}) ->
>> true = is_list(Player_Name),
>> true = is_record(Player_State, player_state),
>> % Maybe do some other checks
>> ok
>> end, Game_State),
>> {ok, Game_State}.
>> ..
>>
>> NB: You'd probably use error_logger instead of all the io:formats.
>>
>> * Now whenever the score gets bumped, or a new game is starts, or a
>> game is concluded, the lane process writes the game state to your DB,
>> or text file. For the simple text file, you could just keep calling...
>>
>> write_game_state(Path, Game_State) ->
>> ok = file:write_file(Path, io_lib:format("~p.", [Game_State])).
>>
>> Possible supervision strategy 2b: (Start Phase version)
>> -------------------------------------------------------
>>
>> I was tipped-off by Ulf Wiger on this thread...
>>
>> http://thread.gmane.org/gmane.comp.lang.erlang.general/48307/focus=48324
>>
>> ... that the initailsiation/coordination done by lane_ldr in 2a above
>> is precisely what the start phases feature of included applications is
>> for! This requires splitting the application into two, but could be
>> make things more manageable for larger applications. So one could get
>> rid of lane_ldr and modify 2a to get something like...
>>
>> alley_sup
>> |
>> bowling_app |
>> - - - - - - - -|- - - - - - - -
>> lanes_app |
>> |
>> ___lanes_sup_____
>> / | : \
>> lane(1) lane(2) .. lane(N)
>>
>> * Split everything into two apps: the primary bowling_app and the
>> included lanes_app.
>> * The primary application would be pretty bare, and would start
>> lanes_sup as if it were one of it's own modules...
>>
>> === bowling_app.app ===
>> {application, bowling_app,
>> [..
>> {mod, {application_starter,[bowling_app,[]]}},
>> {included_applications, [lanes_app]},
>> {start_phases, [{init,[]}, {go,[]}]}
>> ..
>> ]}.
>>
>> === bowling_app.erl ===
>> -behaviour(application).
>> ..
>> %% Called on application:start
>> start(normal, StartArgs) ->
>> alley_sup:start(StartArgs).
>>
>> %% Called *after* entire sup tree is initialised
>> start_phase(init, normal, []) ->
>> % If there's a DB, initialise it here
>> ok;
>> start_phase(go, normal, []) ->
>> ok.
>> ..
>>
>> === alley_sup.erl ===
>> -behaviour(supervisor).
>> ..
>> init([]) ->
>> {ok, {{one_for_one, 1, 30},
>> [{lanes_sup,
>> {lanes_sup, start, []},
>> permanent,
>> infinity,
>> supervisor,
>> [lanes_sup]}]}}. % Mod of included app.
>>
>> * Nothing else is needed in the primary app.
>> * The second application will be responsible for spawning the dynamic
>> children on startup...
>>
>> === lanes_app.app ===
>> {application, lanes_app,
>> [..
>> {env,[{no_of_lanes,10}]},
>> {mod,{lanes,[]}},
>> {start_phases, [{init,[]}, {go,[]}]}
>> ..
>> ]}.
>>
>> === lanes_app.erl ===
>> -behaviour(application).
>> ..
>> %% NOT called
>> start(normal, StartArgs) ->
>> lanes_sup:start(StartArgs).
>>
>> %% Called *after* entire sup tree is initialised
>> %% and corresponding bowling_app:start_phase
>> start_phase(init, normal, []) ->
>> ok;
>> start_phase(go, normal, []) ->
>> {ok, No_Of_Lanes} = application:get_env(?MODULE, no_of_lanes),
>> true = No_Of_Lanes >= 1,
>> case start_lanes(No_Of_Lanes, 0) of
>> No_Of_Lanes ->
>> io:format("All lanes failed to init -- quitting
>> application.~n"),
>> {error, all_lanes_failed}; % Cause app to quit abnormally
>> _ ->
>> ok % One or more lanes init'ed, continue.
>> end.
>>
>> start_lanes(0, E) ->
>> E; % Return no. of lanes that have failed to init
>> start_lanes(N, E) ->
>> case supervisor:start_child(lanes_sup, [N]) of
>> {ok, _} ->
>> io:format("Started lane ~w.~n", [N]),
>> start_lanes(N - 1, E);
>> Err ->
>> io:format("Error starting lane ~w: ~p.~n", [N, Err]),
>> start_lanes(N - 1, E + 1)
>> end.
>>
>> === lanes_sup.erl ===
>> Same as in Strategy 2a
>>
>> === lane.erl ===
>> Same as in Strategy 2a
>>
>> Strategy 2b is cleaner to me than Strategy 2a, even though it requires
>> splitting an application into two which many people seem to have a
>> problem with.
>>
>> - Edmond -
>>
>>
>>> FWIW : the code I am using to learn erlang is at
>>> https://github.com/dnene/bowling . Its not particularly interesting at
>>> this stage since it is still under development.
>>>
>>> Thanks
>>> Dhananjay
>>>
>>> PS: Apologies for posting it to erlang-questions after earlier posting
>>> it to erlang programming google group. Those monitoring the latter
>>> will receive this question twice.
>>>
>>> ________________________________________________________________
>>> erlang-questions (at) erlang.org mailing list.
>>> See http://www.erlang.org/faq.html
>>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>>
>>
>>
>
>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
More information about the erlang-questions
mailing list