new syntax - a provocation

Tue Sep 30 15:45:44 CEST 2003

Joachim Durchholz <joachim.durchholz@REDACTED> writes:

> Luke Gorrie wrote:
> 
> >>It's interesting to consider the question:  "what would an Erlang
> >>implementation have to be like for us to be able to take a running
> >>process and move it to another processor?"  (IBM's AIX was able to
> >>do this on their mainframes, I believe.  I think Kali Scheme can do it.
> > 
> > And also the question: why would you want to? Seriously.
> 
> Code upgrading for a set of communicating nodes, for example.

But why would that involve moving a _process_ from one node to
another?

> (I haven't done this yet in Erlang, so the current methods may be
> enough. OTOH I suspect that such an upgrade could make good use of
> some automation, and that the Kali ideas are a step in the right
> direction.)

In Erlang the most basic way to load a new module definition on all
nodes is something like:

  (b@REDACTED)3> F = "/home/luke/devel/erlang/foo.beam".
  "/home/luke/devel/erlang/foo.beam"
  (b@REDACTED)4> {ok, Code} = file:read_file(F).
  {ok,<<70,79,82,49,0,0,14,...>>}
  (b@REDACTED)5> rpc:multicall([node()|nodes()], code, load_binary, [foo,F,Code]).
  {[{module,foo},{module,foo}],[]}

That loads foo.beam on every node, including the current one. As I
understand it, the OTP release/upgrade management system is a
scripting language that interleaves loading of new modules, notifying
servers that their code has changed (or restarting them), and so on.

In Erlang usage today I think most of us assume that all connected
nodes will be running the same code.

> You can do this all both lazily or eagerly. Both would have their
> place even in Erlang - as a very fast-and-loose application, I'd copy
> everything within an application eagerly and the connections between
> applications lazily (because failure is an option there).
> 
> Anyway, I think Richard was more interested in code migration than in
> lazy copying :-)

Interesting. My interpretation was entirely different: taking a
process - its continuation, its private heap, its identity (pid) - and
moving it to another node. Some sort of magic move_process(Pid,Node)
BIF.

I assume that all nodes are running the same code -- or at least that
handling code consistency/loading issues is a separate problem.

The tricky part seems to be moving the identity. Is there a simple way
to do this that would be generally useful? If not, I'm more inclined
to just write some code to kill the process on one node and start the
new one on the other by hand.

I'm racking my brain for examples in our system where we more-or-less
move a process from one node to another. The closest I can think of is
with servers that are global to the cluster - if their node goes down
the process should be restarted on another, which is a bit like moving
it. Trouble is, you can't really move a process after its machine has
crashed or been isolated from the network :-). We use the 'global'
module to manage these 'one-per-cluster' server processes.

Cheers,
Luke