[erlang-questions] data sharing is outside the semantics of Erlang, but it sure is useful

Tue Sep 15 10:30:23 CEST 2009

Mikael Pettersson wrote:
>  
> Erlang's default memory model doesn't allow same-node processes to
> share memory(*), so you will lose sharing in message sends.

Yes, but there are two levels of sharing here.

1. The sharing of terms across processes, as with large binaries
2. The relative sharing within the term itself.

The latter could be preserved using a sharing-preserving copy.
As this is invariably more expensive than the current copying
algorithm when there is no sharing to preserve (likely a very
common case), it is reasonable that this isn't the default.

The problem today is that you cannot make a sharing-preserving
copy between processes at the Erlang level even if your life
depended on it, and in some cases, this may indeed be the case,
figuratively speaking. Some data structures simply cannot be
passed in a message or in a spawn, since the loss of sharing
leads to a memory explosion.

One problem is of course that if a process that has such a
term crashes, and EXIT messages are propagated that contain
the term, the EXIT propagation will kill the node. Even
without loss of sharing, this can happen, of course. I did
it once by spawn-linking 100,000 processes from the shell
and then mis-spelling length(processes()). This led to an
EXIT message, containing a list of 100,000 pids, that was
copied 100,000 times. My workstation was frozen for 10
minutes while the VM was trying to find a way to cope.

No mystery there, of course, and no sharing either.
All I could do was laugh at my stupidity and take an
extra coffee break.

BR,
Ulf W
-- 
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com