[erlang-questions] question re. message delivery

Tue Sep 26 18:58:16 CEST 2017

See in-line...

On 9/26/17 9:39 AM, Matthias Lang wrote:

> Hi,
>
> I'm more than a bit surprised by what I'm reading here and maybe part
> of it has to do with people meaning different things by "message
> passing protocol".
>
>   MF> I think that your examples essentially demonstrate that, for a lot of
>   MF> applications, one pretty much has to implement one's own message passing
>   MF> protocol on top of Erlang's - to guarantee that all messages are
>   MF> delivered, and delivered in order.  Some applications can tolerate
>   MF> missed messages, a lot can't.
>
> I like the 20 year old advice from Per Hedeland which I quote in the
> FAQ (10.8 and 10.9)
>
>    http://erlang.org/faq/academic.html#idp33047120
>
> If this advice is wrong, then I should update it, but convincing arguments
> and some sort of consensus would be required for a change.
>
> The situations I'm aware of where messages can disappear are:
>
>    1. When the receiving process disappears, for instance because it
>       crashed. This applies to both single-node and distributed Erlang.
>
>    2. When the communication between nodes breaks. This applies to
>       distributed Erlang only.

Which is what I'm curious about.  Of course, with multi-processor 
architectures, one must also consider communications between processors 
on the same node.
>
>    3. Quite a few years ago (2005? 2007?), Hans Svensson demonstrated
>       some cases where if you restarted nodes in a distributed Erlang
>       system in particular ways, then things could get strange with
>       message passing.

Exactly.
>
>    4. Hardware errors, compiler bugs, etc.
>
> For #1 and #2, I don't think it's good to describe the solution as
> "implement one's own message passing protocol on top of Erlang's".
> The failure is quite specific, you get all messages up to the crash
> and then you get none after that. It's not the message passing that's
> the problem.

Of course it is.  If one wants reliable packet delivery, one implements 
TCP (or equivalent) above raw IP.  If one wants reliable email, one 
implements a return receipt function.  Etc.

>
> For #3, my unreliable recollection was that this was a situation where
> the implementation was unexpectedly weak. It may go deeper than that
> and it may be that the implementaiton is better today. I don't know.
>
> #4 seems irrelevant. If you're worried that just the right combination
> of flipped bits or compiler errors, no matter how unlikely, can cause
> a message to disappear, then putting "one's own message passing
> protocol on top of Erlang's": isn't going to eliminite that. There
> will be some combination of flipped bits that will defeat it.
>
> Miles, do you have some concrete examples of situations where you're
> worried about messages disappearing? Here's one from me: process 1
> sends two messages to process 2. The messages are A and B,
> respectively. Process 2 sends an ACK for message B back to process
> 1. For single-node Erlang, if message A disappears then that is a
> bug. I'll let others reason about distributed Erlang.

Sure.  Bank transactions.  Edits to a document.  Dispatch commands to a 
vehicle.

Both order and missing messages matter.

The question remains, how does the actual Erlang run-time system respond 
in the case of various kinds of failures.  And will those behaviors 
remain consistent in future releases.

-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra