Getting locks and sharing: was RE: Getting concurrency
Ulf Wiger (AL/EAB)
ulf.wiger@REDACTED
Mon Jun 20 17:56:28 CEST 2005
> "Ulf Wiger" <ulf@REDACTED> wrote:
> > RPC1 - between two processes in the same Erlang node
>
> Here, we are 'guaranteed' that messages cannot be dropped.
>
> Where is that promised?
I don't know. That's why I put it in quotes. (:
In the Erlang Reference Manual, it says:
"Sending a message to a pid never fails, even if the
pid identifies a non-existing process."
So you see, the message is guaranteed to get there,
even if 'there' doesn't exist. That's quite some service
guarantee! (:
(I read it as "the send operation will not raise an
exception, but whether the message is actually delivered
is another matter entirely.")
In the Erlang 4.7 specification, 10.6.2 (pg 158), it says:
"It is guaranteed that if a process P1 dispatches two
signals s1 and s2 to the same process P2, in that order,
then signal s1 will never arrive after s2 at P2. It is
ensured that whenever possible, a signal dispatched to a
process should eventually arrive at it. There are situations
when it is not reasonable to require that all signals arrive
at their destination, in particular when a signal is sent to
a process on a different node and communication between
the nodes is temporarily lost."
You don't have to tell me that this text carefully avoids
leaving any sort of guarantee that a message sent will
actually arrive given specific circumstances. It even avoids
guaranteeing that s1 will have arrived before s2. This was
however the 'guarantee' that I referred to.
In practice, if you really need to make sure that a message
has arrived, you must wait for explicit acknowledgement.
However, it is commonly assumed that if a process is alive,
a local message will reach it. It would be difficult to
imagine the system not being able to deliver messages to
a healthy process using local communication. Would this
also extend to EXIT messages sent to supervisor processes?
If so, supervision cannot be relied upon.
> I can think of all sorts of reasons why RPC1 might be
> unreliable. I didn't like the idea at first, but then
> it dawned on me that if there are lots of messages being
> sent to a process which, though live, isn't bothering to
> listen (or is listening for the wrong thing), the mailbox
> will eventually take over all available memory
But if it's an RPC, the client will not send additional
messages until it times out waiting for the response to
the previous one. Eventually, one may assume that all
potential clients, will be stuck waiting for that bad
server.
What happens when a client times out waiting for a server
is of course application specific. One may simply retry,
but in that case, the situation you describe may arise.
One can also exit, which is the recommended (by way of
implementation) procedure. Following the OTP guidelines,
the client would normally die, and be restarted by its
supervisor. This may trigger a new RPC1 to the server,
which may also hang. Eventually, the configurable restart
limit may kick in, and the supervisor will terminate,
escalating the restart. When the escalation reaches
the top (the application controller), the erlang VM is
terminated (and possibly restarted by some HEART program.)
Getting this to work smoothly and reliably takes quite
a bit of thinking and tuning. But it's not rocket science.
> to the detriment of other processes UNLESS at some point
> messages are discarded.
I may be wrong, but I don't think the Erlang VM does that.
/Uffe
More information about the erlang-questions
mailing list