How to do a takeover?

Wed Mar 16 09:36:19 CET 2005

One way to handle takeover is to use globally 
registered names, as you suggest.

A reasonably simple way to coordinate transfer
during takeover is to use the following sequence:

(Context, application instance {A, N1} is supposed to 
take over from {A, N2}, where N1, N2 are node names.)

1) From e.g. a start_phase function, issue a gen_server
   call to processes in {A, N1} that should transfer 
   global names and state.

2) In the handle_call for each process P,
   a) first call global:re_register_name() to move 
      the name from {A,N2} to {A,N1}. This should 
      prevent new requests from coming in to {P,N2}
   b) Issue a gen_server:call({P,N2}, takeover_state),
      which signals the process locally registered as P
      on node N2 to hand over the state. This message
      should be processed _after_ all external requests
      in {P,N2}, due to the FIFO semantics of gen_server.
   c) {P,N1} now has the global name and the state, and 
      can return control to the gen_server.
   d) {P,N2} could perhaps relay all further messages
      to {P,N1} until it is terminated, but it shouldn't
      be strictly necessary.

One thing to consider is that each "top application" is 
moved as one entity, and as soon as takeover is finished,
the old instance is terminated. This can cause problems 
in applications that assume that they will always be on 
the same node (e.g. applications using SNMP).

One way to address this is to introduce a wrapper 
application that includes all such applications. This 
has been done for many years in e.g. AXD 301. In AXD 301,
this didn't solve all problems, so we made a special
"distributed application controller" that is able to 
coordinate takeover of several applications in parallel.
For the longest time, though, making one large application
of included O&M applications worked just fine.

/Uffe

> -----Original Message-----
> From: owner-erlang-questions@REDACTED
> [mailto:owner-erlang-questions@REDACTED]On Behalf Of Anders Nygren
> Sent: den 16 mars 2005 01:16
> To: erlang-questions@REDACTED
> Subject: How to do a takeover?
> 
> 
> Hi
> I have started looking at how to do a takeover.
> In my tests I discovered that I need to do a global:reregister_name
> to "move" the name of my server to the new node, and how to move some 
> state information from the old to the new node.
> 
> The way my current design works is that I have a number of 
> gen_servers.
> When one server needs to call another server it spawns a 
> worker that makes the
> call, so as to not block. This gives me the problem that I 
> have a lot of workers
> that must terminated correctly before the takeover can be finalized.
> 
> But I dont understand how to make a controlled transfer of 
> messages in the 
> mailbox, or linked worker processes.
> 
> My best guess now is that I have to stop using 
> gen_server:call to other servers
> and instead use an interface between my servers that always 
> sends messages to
> processes with globally registered names.
> 
> /Anders Nygren
>