[erlang-questions] Ideas for a new Erlang
Darren New
dnew@REDACTED
Thu Jun 26 19:41:49 CEST 2008
Ulf Wiger wrote:
> I think we can stipulate that when serializing a pid, perhaps storing it
> on disk, then re-creating it and trying to use it, all bets are off anyway.*
I would certainly hope not. One has to serialize a pid to send it to a
different node. I would hope that creating a process on my node and then
sending you the only copy of the pid returned by spawn() doesn't cause
the process to exit if it's waiting for your node to send it a message.
> Still, it would of course be quite difficult to introduce the semantics
> that a process exits if it's in a blocking receive and there are no
> references to it - even if it would turn out to have been a good idea
> in the first place.
That's a good idea. You just have to define what "references to it"
means more carefully. It's too easy for Erlang to have references that
aren't "known" by any of the runtimes. There are hidden nodes (which may
or may not affect things), for example, even if you discount serialized
values.
If I'm linked to a process that is blocked in such a receive, does that
blocked process terminate? Does it get GCed without terminating? Does
that stop other processes for which it had references transitively?
If I lose a connection to the only node that has a copy of my pid,
should I exit? What if the connection comes back?
> * I haven't checked the GC implementation, but I would assume that
> a pid is available to be reused when the process has exited and there
> are no known references to the pid (serialized references don't count).
I would think that's a bad idea, actually. Especially since it's so easy
to have an "unknown reference" in Erlang. Anything in a TCP buffer is
going to be an "unknown reference".
You could, of course, answer all these questions. I'm just saying that
GCing processes in Erlang that haven't exited seems unusually difficult
to get right.
> * I haven't checked the GC implementation, but I would assume that
> a pid is available to be reused when the process has exited and there
> are no known references to the pid (serialized references don't count).
Well, it looks like you can only have 2^20 unique references per node
(or three crashes) before you start having problems, and the
documentation for EXT_PID isn't complete enough to actually say how many
PIDs you can have per node. (Hint to authors: documenting some field as
"this field is included to make things better" isn't informative.
Similarly, stating how big a field is without saying why you selected a
"new" representation of something isn't informative. :-)
Of course, since sending to a non-existant PID is undetectable by the
sender without going through all kinds of monitor() dances, making
things reliable in the face of network partitions would seem to need a
fair amount of code. I don't think I'd use the distribution primitives
over the public internet to talk between nodes. I have enough trouble
keeping my stuff running in the face of ISPs disappearing for dozens of
minutes at a time without the software assuming lack of error is
success. :-)
--
Darren New / San Diego, CA, USA (PST)
Helpful housekeeping hints:
Check your feather pillows for holes
before putting them in the washing machine.
More information about the erlang-questions
mailing list