Getting locks and sharing: was RE: Getting concurrency

Tue Jun 21 14:08:26 CEST 2005

Richard A. O'Keefe wrote:

> "Ulf Wiger (AL/EAB)" <ulf.wiger@REDACTED> wrote:
>
> 	However, it is commonly assumed that if a process is alive,
> 	a local message will reach it.
> 
> There's that terminally cute saying that "assume" makes an ASS
> out of U and ME.

In this case, the assumption is fairly deeply rooted in actual
understanding of the Ericsson implementation of Erlang.

As you noted elsewhere, there are extreme circumstances under
which a message simply cannot be delivered. The one that springs
to mind is when there isn't enough memory available.

> 
> 	It would be difficult to imagine the system not being able to
> 	deliver messages to a healthy process using local communication.
> 
> What's a healthy process?

I fail to imagine any other alternatives than:
- the erlang node is healthy (e.g. not out of memory), and
- the process is not dead.

> How can the message delivery code 
> tell whether a process is healthy quickly enough for
> knowing to be useful?

It can quickly determine whether the process is alive.
It will eventually discover where the node runs out of 
memory. Handling the out-of-memory situation gracefully
is a dilemma. I think steps have been taken to improve
the behaviour.

> 	Would this
> 	also extend to EXIT messages sent to supervisor processes?
> 	If so, supervision cannot be relied upon.
> 	
> I don't need to tell you that there *automatically* generated
> EXIT messages might be more reliable than programmer-generated 
> ones.

One might think so, yes, but the one occasion I can remember
where attempted message delivery failed, actually involved
EXIT messages. I had a node with ~ 100,000 processes, all 
linked to the shell process. The shell process died as I 
misspelled length(processes()), which made the system try
to deliver an EXIT message containing, among other things, 
a list of >100,000 pids. Since the standard Erlang VM copies
messages on send, it ended up trying to create 100,000 copies
of the rather large EXIT message. This kept my workstation
busy for the better part of 10 minutes, and then the entire
VM died.

If one were to formulate a guarantee, I guess it would be 
something along the lines of "if a message is sent to a 
local process which is alive, the runtime system will either
deliver the message or die trying." One could imagine other
strategies for handling exceptional conditions.

> If there is a rule that prevents supervision across
> node boundaries, I have failed to notice or understand
> it, in which case supervision does have trouble.

There is no such rule. Supervision across node boundaries
is indeed problematic. I would hesitate to use the 
supervisor behaviour in such a way that child processes
run on a node different from that of the supervisor.

> 	> to the detriment of other processes UNLESS at some point 
> 	> messages are discarded. 
> 	
> 	I may be wrong, but I don't think the Erlang VM does that.
> 	
> There is more than one Erlang implementation, although the
> "real" one does have a habit of advancing faster than the
> others could keep up. (E2S, GERL, ...)  If it's not promised
> explicitly somewhere, we can't rely on future Erlang
> implementations happening to miss the same opportunities.

I think the idea of discarding messages might be a way forward,
but think that one must go further than to just randomly 
discard them. It is very useful to know for sure that if 
s1 and s2 have been sent from P1 to P2, then if s2 arrives,
s1 is guaranteed to have arrived before it. The spec doesn't
state this, but it's what I think it should state.

That would mean that if you start discarding messages to 
a process, you had better continue to discard them, until
the program somehow takes action (an appropriate action
might be to restart the process whose mailbox has been 
effectively disabled.)

/Uffe