Getting locks and sharing: was RE: Getting concurrency

Mon Jun 20 17:56:28 CEST 2005

> "Ulf Wiger" <ulf@REDACTED> wrote:
> 	> 	RPC1 - between two processes in the same Erlang node
> 	
> 	Here, we are 'guaranteed' that messages cannot be dropped.
> 	
> Where is that promised?

I don't know. That's why I put it in quotes. (:

In the Erlang Reference Manual, it says:
"Sending a message to a pid never fails, even if the 
pid identifies a non-existing process."

So you see, the message is guaranteed to get there,
even if 'there' doesn't exist. That's quite some service
guarantee!  (:

(I read it as "the send operation will not raise an 
exception, but whether the message is actually delivered
is another matter entirely.")

In the Erlang 4.7 specification, 10.6.2 (pg 158), it says:

"It is guaranteed that if a process P1 dispatches two 
signals s1 and s2 to the same process P2, in that order, 
then signal s1 will never arrive after s2 at P2. It is 
ensured that whenever possible, a signal dispatched to a 
process should eventually arrive at it. There are situations 
when it is not reasonable to require that all signals arrive 
at their destination, in particular when a signal is sent to 
a process on a different node and communication between
the nodes is temporarily lost."

You don't have to tell me that this text carefully avoids
leaving any sort of guarantee that a message sent will 
actually arrive given specific circumstances. It even avoids
guaranteeing that s1 will have arrived before s2. This was
however the 'guarantee' that I referred to.

In practice, if you really need to make sure that a message
has arrived, you must wait for explicit acknowledgement.
However, it is commonly assumed that if a process is alive,
a local message will reach it. It would be difficult to 
imagine the system not being able to deliver messages to 
a healthy process using local communication. Would this 
also extend to EXIT messages sent to supervisor processes?
If so, supervision cannot be relied upon.

> I can think of all sorts of reasons why RPC1 might be 
> unreliable.  I didn't like the idea at first, but then
> it dawned on me that if there are lots of messages being
> sent to a process which, though live, isn't bothering to
> listen (or is listening for the wrong thing), the mailbox
> will eventually take over all available memory

But if it's an RPC, the client will not send additional 
messages until it times out waiting for the response to
the previous one. Eventually, one may assume that all 
potential clients, will be stuck waiting for that bad 
server.

What happens when a client times out waiting for a server
is of course application specific. One may simply retry,
but in that case, the situation you describe may arise.
One can also exit, which is the recommended (by way of 
implementation) procedure. Following the OTP guidelines,
the client would normally die, and be restarted by its 
supervisor. This may trigger a new RPC1 to the server, 
which may also hang. Eventually, the configurable restart 
limit may kick in, and the supervisor will terminate,
escalating the restart. When the escalation reaches 
the top (the application controller), the erlang VM is 
terminated (and possibly restarted by some HEART program.)

Getting this to work smoothly and reliably takes quite 
a bit of thinking and tuning. But it's not rocket science.

> to the detriment of other processes UNLESS at some point 
> messages are discarded. 

I may be wrong, but I don't think the Erlang VM does that.

/Uffe