Process migration, was: Re: Extending arithmetic

Wed Feb 12 04:55:00 CET 2003

Scott Lystig Fritchie <fritchie@REDACTED> wrote:
> ETS tables muck up the works, though it's easy enough to use
> ets:tab2file/2 or ets:tab2list/1 to transfer them to the new
> location.  If you're relying on a public ETS table, then you get moved
> to a node that doesn't have that table (or has one by that name but
> different contents), er, well, don't do that.  :-)

Though if a concept of global ets tables was introduced, it may be
possible to require that any migrateable process only depend on
globally available ets tables.  Oh wait, that'd be ram based Mnesia
tables, only a little lighter-weight.  But it would be hard to do
fully distributed updates without the transaction support already in
Mnesia.  Thus, if you want to migrate, don't use ets, use ram based
Mnesia tables and use full copies on the necessary nodes.

> Process linking is tricky.  If A is linked to B, and B is migrated
> into a process C, then:
> 
> * if C is simply linked to B, then when C dies, B will die too.
> Hopefully A will not be confused when A receives an EXIT message from
> C (which A knows nothing about).
> 
> * if the VM has support for this kind of thing, then perhaps the link
> from A<->B could be silently moved to A<->C, but if C dies, then the
> VM should translate the Pid in C's EXIT message to use B's Pid.
> 
> Message passing: it's easy enough to have B forward all messages to
> C.  It's extra overhead to do that.  (No, I haven't measured how much,
> but I should if I were ever serious about this sort of thing, right?)
> 
> If A is aware that B has moved ... well, I haven't thought too hard
> about this.  If you're using functions to encapsulate sending all your
> messages to B, then A could use its process dictionary (!) to keep
> track of which Pids have migrated and what their new Pid ... without
> mucking A's source code (much) to keep track of this new state.

This is a huge ball of wax.  If/when the pid changes, a lot of process
state on neighboring processes will get very, very confused.  Thus
you can't migrate a server process for example while any requests are
outstanding to client processes, as the clients are most likely using
the server pid as part of their receive call.

In the same vein of thought you were going about using the process
dictionary, I was thinking about using a "routing table" system over
the past few days.  Every process has a module or a fun stuck in its
dictionary which is responsible for fowarding messages, e.g.:

	Dest ! Message

gets translated to a call to:

	route(Dest, Message) when pid(Dest) ->
		Dest ! Message;
	route(Dest, Message) when atom(Dest) ->
		DestPid = whereis(Dest),
		DestPid ! Message;
	route({Dest, Node}, Message) ->
		% send to remote node's named process.
		dist ! {sendto, Node, Dest, Message}.

And by plugging in a different router, stranger things can happen:

	route(Dest, Message) when pid(Dest) ->
		Dest ! Message;
	route(inet_db, Message) ->
		case get(can_access_inet) of
			true ->
				whereis(inet_db) ! Message;
			false ->
				exit({error, {enoaccess, inet}})
		end;
	route(log, Message) ->
		disk_log:alog(?MY_APPLICATION_LOG, Message),
		Message;
	route(file_server, _) ->
		exit({error, {enoaccess, file_server}});
	route(code_server, Msg) ->
		% just drop
		Msg;
	route(Dest, Message) when atom(Dest) ->
		DestPid = global:whereis_name(Dest),
		DestPid ! Message;
	route({Dest, Node}, Message) ->
		% send to remote node's named process.
		dist ! {sendto, Node, Dest, Message}.

My only concern was performance, as well as the ugly case of how
does the erlang VM know it shouldn't attempt to recurse into the
current router module/fun when ! is used within the router to
send the message out.  Perhaps the ! -> router call translation
is always performed unless a compile time option is set at the
top of the module.

I also thought of a 'firewall' module to accept/reject incoming
messages in the same token that the router is processing outgoing.
Perhaps sticking state within the process dictionary and using
this model could help process migration, esp. if the firewall
and router could do processing of the message to rewrite pids.

Talk about slow though!

What's worse is, how do you handle that pid that was vacated being
reassigned later on to a new process while existing processes
have references to the migrated process using that vacated pid?
Now we have two processes referred to as <0.18.0> and there is
no way to tell them apart.  :-(

The kernel would have to reserve any migrated process's pid for
all time, or at least until the migrated process is dead.  Whoa.

Maybe pids shouldn't be so direct.  Perhaps pids should be replaced
more by a reference type which can be looked up in a table, much like
IP address to MAC address translation in an IP stack.  The MAC can
change (quickly even) and yet the IP remains the same, allowing
transparent communication.

-- 
Shawn.