Using failover
Ulf Wiger
etxuwig@REDACTED
Thu Feb 10 18:11:51 CET 2000
> Date: 10 Feb 2000 17:29:45 +0100
> From: Samuel Tardieu <sam@REDACTED>
>
> Due to major power trouble in my building, I built an application to
> monitor those power failures. Since this application needs to be
> fault tolerant, as its results are used by the technicians working on
> the power outages, I use the "distributed" kernel parameter.
>From your later mail it appears as if you'd like to perform state
transfer as well. Here goes:
> It is not clear to me what I should do in takeover mode. My
> application is mainly a globally registered gen_server. How can I
> cleanly shut down the server running on the other node (the one I'm
> taking over) and make sure it is done before starting the application
> locally? Won't this create a race condition where the application
> won't be restarted if the top-priority node dies after stopping the
> application on the remote node and before registering the new process
> locally?
If you use global:re_register_name(Name, NewPid), the new instance of
your process will simply take over the name, and calls to the global
server will be re-routed to the new instance.
The old application instance is shut down automatically when the
new application instance is fully started.
Here's a simple but relatively safe way of doing things:
1. First, add the following attribute to your app file
(see erl -man application):
%% This activates a phased start of the application. Mod:start/2 is
%% always the first function to be called; then the functions in the
%% start_phases list will be called in order. Syntax: [{Fun, Args}]
%% which leads to the call Mod:Fun(Type, Args) (Mod as specified in the
%% 'mod' attribute. Using this attribute, you may also get
%% Type = {failover, Node} (it's done this way for BW compat reasons)
{start_phases, [{go, []}]},
2. Modify pomonitor_app.erl to include a callback for the go/2 phase:
-module (pomonitor_app).
-behaviour (application).
-export ([start/2, go/2, stop/1]).
start (normal, _) ->
pomonitor_sup:start_link ();
start ({failover, _}, _) ->
pomonitor_sup:start_link ();
start ({takeover, _}, _) ->
pomonitor_sup:start_link ().
go({takeover, FromNode}, _) ->
pomonitor:perform_takeover(FromNode);
go(_, _) -> % Type = normal | {failover, FromNode}
ok.
stop (_) -> ok.
3. Write a function to handle the takeover:
pomonitor.erl (assuming this is a globally registered gen_server):
perform_takeover(FromNode) ->
gen_server:call(pomonitor, {perform_takeover, FromNode}).
...
init(_) ->
%% Need to check first before registering a global name.
%% One way to do this is to use application:start_type() to find out
%% whether the application is starting, or if it's a local process crash
%% but this is not entirely safe. We could be restarting from a process
%% crash on the retiring side of a takeover, after having passed on our
%% state, but before shutting down. If this is the case, we MUST not
%% re-register. Here we use global:safe_whereis_name/1 (not whereis_name/1
%% because we must send a message to global, giving it a chance to unreg
%% me if I just crashed and am restarting.
case global:safe_whereis_name(pomonitor) of
undefined ->
%% this is most likely a local process restart
global:re_register_name(pomonitor, self());
_ ->
%% there is another globally registered instance
%% most likely a takeover in progress. Wait for takeover msg.
skip
end,
...
{ok, #state{}}.
handle_call({perform_takeover, FromNode}, From, State}) ->
%% Cute detail of takeover. I first re-register, stealing the name; then
%% I ask for the state. Pending calls from clients will be serviced
%% on the other side; new calls (after my re_register_name()) will be
%% buffered by me until I have the new state; afterwards, all calls will
%% be serviced by me; the old instance can most likely just sit there and
%% wait for its application to terminate.
global:re_register_name(pomonitor, self());
NewState = gen_server:call({pomonitor, FromNode}, takeover_state),
{reply, ok, NewState};
handle_call(takeover_state, From, State) ->
{reply, State, State};
/Uffe
Ulf Wiger, Chief Designer AXD 301 <ulf.wiger@REDACTED>
Ericsson Telecom AB tfn: +46 8 719 81 95
Varuvägen 9, Älvsjö mob: +46 70 519 81 95
S-126 25 Stockholm, Sweden fax: +46 8 719 43 44
More information about the erlang-questions
mailing list