Starting a gen_server on a remote node or is it failover?

Wed Nov 5 15:31:23 CET 2003

Hi all,

I have been thinking about a specific problem for a while now. In short
the, scenario is this. Suppose one needs a distributed system where an
application ["The Application"] is configured for failover [using kernel
configuration]. One can thus assume that the application is running
somewhere on a node in the system and will be restarted as configured.
Suppose this application supervises a number of resources [as in gen
server processes] of which there can be only one instance of each in the
system at any time [Think a single connection to an external system,
etc.] [System = entire distributed system].

If "The Application" is a supervisor implementation, it is trivial to
have these resources run on the same node as the supervisor. They will
then move wherever the application is running depending on
configuration. 

Now, suppose I don't want the resource processes to run on the same node
necessarily, but on an arbitrary node. This is primarily for
load-balancing purposes. How should I build this? The one thing I have
to note is that "The Application" is *not* aware of how many or which
resource processes it is supervising - these are determined dynamically
when the application/supervisor starts. I know one can build an
application per resource, but that is not practical for me, as these
things are very dynamic. 

It is obvious for me that erlang can easily manage the failover of
entire applications, but how does one manage smaller entities [as in
gen_server] failover? Now back to my question... I cannot seem to figure
out how to convince a gen_server to start remotely as gen_servers can
only be started as:

	start(Module, Args, Options) -> Result
	start(ServerName, Module, Args, Options) -> Result
	start_link(Module, Args, Options) -> Result
	start_link(ServerName, Module, Args, Options) -> Result

[Note the absence of a (Node,ServerName,...) version.)

If I could do this, without messing in gen_server.erl or gen.erl, it
might be possible to have the application supervisor spawn a gen_server
remotely and manage it. All that then remains is to give the supervisor
the ability to determine where to start the gen_server on failure [On
the same node as previously, or if the node has failed, on a backup
node].

Has anyone done something similar or know of an alternative approach
which might not be obvious to me?

Thank you in advance,

Rudolph