[erlang-questions] Ideas for a new Erlang

Thu Jun 26 20:12:10 CEST 2008

Ulf Wiger wrote:
> 2008/6/26 Darren New <dnew@REDACTED>:
>> Ulf Wiger wrote:
>>> I think we can stipulate that when serializing a pid, perhaps storing it
>>> on disk, then re-creating it and trying to use it, all bets are off
>>> anyway.*
>> I would certainly hope not. One has to serialize a pid to send it to a
>> different node. I would hope that creating a process on my node and then
>> sending you the only copy of the pid returned by spawn() doesn't cause the
>> process to exit if it's waiting for your node to send it a message.
> 
> In that case, the remote holder of the pid cannot rely on the pid referring
> to the same process if it's used some time later.

How long is "some time"?

We're conflating "reusing the PID" and "GCing a process which we don't 
think anyone will ever wake up". If I GC the process as soon as there's 
no local or remote reference to the PID, I might wind up GCing the 
process while the data representing the PID is on the wire on its way to 
the node going to use it.

> It should monitor the process in order to detect whether it dies.

Again, not what I'm talking about. Erlang gives me ways to mostly do 
this unreliably. (If Erlang had a reliable way to do it, we wouldn't 
need mnesia:set_master_nodes, for example.)

However, I'm talking about when there's no failure at all. You can't GC 
a process just because there's no unpacked/unserialized copies of a PID 
for the process, because serialization is an intrinsic part of how 
Erlang works with pids. Even without failures of nodes or links or use 
of term_to_binary, you might still GC a process while other processes 
"have" references to it, simply due to timing of deserialization of the 
internode protocol.

> The process doesn't exit just because there are no known references
> to it, 

Right. That was what I was talking about. I was pointing out that 
programming as if this feature may be right around the corner is 
probably unwise.

 > but once a process /has/ died, and all known (local) references are
> gone, some other process may reuse that pid. This is why storing pids
> persistently is a very bad idea.

Except that storing pids "persistently" in variables is how one uses 
them. No matter where you store them, in variables or in disk files, you 
have to get rid of them when you get a monitor or link message saying 
the process exited, and you have to deal with the fact that you might 
get those messages when the process has, indeed, not yet exited and 
indeed doesn't know those messages have been sent. I'm not sure that 
storing a pid in a table is particularly more difficult than storing a 
pid in the variables of a process that's not expected to ever exit.

-- 
Darren New / San Diego, CA, USA (PST)
  Helpful housekeeping hints:
   Check your feather pillows for holes
    before putting them in the washing machine.