[erlang-questions] question re. message delivery

Matthias Lang matthias@REDACTED
Tue Sep 26 18:39:48 CEST 2017


Hi,

I'm more than a bit surprised by what I'm reading here and maybe part
of it has to do with people meaning different things by "message
passing protocol".

 MF> I think that your examples essentially demonstrate that, for a lot of
 MF> applications, one pretty much has to implement one's own message passing
 MF> protocol on top of Erlang's - to guarantee that all messages are
 MF> delivered, and delivered in order.  Some applications can tolerate
 MF> missed messages, a lot can't.

I like the 20 year old advice from Per Hedeland which I quote in the
FAQ (10.8 and 10.9)

  http://erlang.org/faq/academic.html#idp33047120

If this advice is wrong, then I should update it, but convincing arguments
and some sort of consensus would be required for a change.

The situations I'm aware of where messages can disappear are:

  1. When the receiving process disappears, for instance because it
     crashed. This applies to both single-node and distributed Erlang.

  2. When the communication between nodes breaks. This applies to
     distributed Erlang only.

  3. Quite a few years ago (2005? 2007?), Hans Svensson demonstrated
     some cases where if you restarted nodes in a distributed Erlang
     system in particular ways, then things could get strange with
     message passing.

  4. Hardware errors, compiler bugs, etc.

For #1 and #2, I don't think it's good to describe the solution as
"implement one's own message passing protocol on top of Erlang's".
The failure is quite specific, you get all messages up to the crash
and then you get none after that. It's not the message passing that's
the problem.

For #3, my unreliable recollection was that this was a situation where
the implementation was unexpectedly weak. It may go deeper than that
and it may be that the implementaiton is better today. I don't know.

#4 seems irrelevant. If you're worried that just the right combination
of flipped bits or compiler errors, no matter how unlikely, can cause
a message to disappear, then putting "one's own message passing
protocol on top of Erlang's": isn't going to eliminite that. There
will be some combination of flipped bits that will defeat it.

Miles, do you have some concrete examples of situations where you're
worried about messages disappearing? Here's one from me: process 1
sends two messages to process 2. The messages are A and B,
respectively. Process 2 sends an ACK for message B back to process
1. For single-node Erlang, if message A disappears then that is a
bug. I'll let others reason about distributed Erlang.

Matthias

--------

Date: 26. September 2017
From: Raimo Niskanen <raimo+erlang-questions@REDACTED>
To erlang-questions@REDACTED
Subject: Re: [erlang-questions] question re. message delivery


> Since this seems to be about a thesis by Joe, not about the impmenentation,
> Joe can defend his own thesis.
>
> / Raimo
>
>
> On Mon, Sep 25, 2017 at 09:02:39AM -0700, Miles Fidelman wrote:
> > On 9/24/17 11:53 PM, Raimo Niskanen wrote:
> >
> > > On Sun, Sep 24, 2017 at 11:24:31PM -0700, Miles Fidelman wrote:
> > >> See below....
> > >>
> > >>
> > >> On 9/24/17 6:10 PM, zxq9 wrote:
> > >>> On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
> > >>>> Folks,
> > >>>>
> > >>>> I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
> > >>>> question that's been nagging me.
> > >>>>
> > >>>> As I understand it, message delivery is not guaranteed, but message
> > >>>> order IS. So how, exactly does that work?  What's the underlying
> > >>>> mechanism that imposes sequencing, but allows messages to get lost?
> > >>>> (Particularly across a network.)  What are the various scenarios at play?
> > >>> This is sort of backwards.
> > >>>
> > >>> Message delivery is guaranteed, assuming the process you are sending a
> > >>> message to exists and is available, BUT from the perspective of the
> > >>> sender there is no way to tell whether the receiver actually got it,
> > >>> has crashed, disappeared, fell into a network blackhole, or whatever.
> > >>> Monitoring can tell you whether the process you are trying to reach
> > >>> is available right at that moment, but that's it.
> > >>>
> > >>> The point is, though, that whether the receiver is unreachable, has
> > >>> crashed, got the message and did its work but was unable to report
> > >>> back about it, or whatever -- its all the same reality from the
> > >>> perspective of the sender. "Unavailable" means "unavailable", not matter
> > >>> what the cause -- because the cause cannot be determined from the
> > >>> perspective of the sender. You can only know this with an out of
> > >>> context check of some sort, and that is basically the role the runtime
> > >>> plays for you with regard to monitors and links.
> > >>>
> > >>> The OTP synchronous "call" mechanism is actually a complex procedure
> > >>> built from asynchronous messages, unique reference tags, and monitors.
> > >>>
> > >> Note that I didn't ask about the synchronous calls, I asked about raw
> > >> interprocess messages.
> > >>
> > >>> What IS guaranteed is the ordering of messages *relative to two processes*
> > >>>
> > >>> If A sends B the messages 1, 2 and 3 in that order, they will certainly
> > >>> arrive in that order (assuming they arrive at all -- meaning that B is
> > >>> available from the perspective of A).
> > >> But that's the question.  Particularly when sent via network, 1, 2, 3
> > >> may be sent in that order, but, at the protocol level, they may not
> > >> arrive in that order.
> > > What protocol level?
> > >
> > > Erlang distribution has to use or implement a reliable protocol.  Today
> > > TCP, but anything is possible.  Note that this protocol is between two
> > > nodes, both containing many processes.  But the emulator relies on the
> > > transport protocol being reliable.
> >
> > No.  It doesn't.  It could simply send UDP packets.  I'm asking about
> > implementation details.  In Joe's thesis, he says that the behavior is a
> > "design choice."  I'm asking about the implementation details.  How does
> > BEAM actually handle message delivery - locally, via network?
> >
> > >> With a reliable transport protocol - say TCP - if the message-containing
> > >> packets arrived as 1, 3, 2, the protocol engine would wait for 2 to
> > >> arrive and deliver 1,2,3 in that order.  If It received 1 & 3, but 2 got
> > >> lost, it would request a re-transmit, wait for it to arrive, and again,
> > >> deliver in that order.
> > >>
> > >> But the implication of Erlang's stated rules is that an unreliable
> > >> transport protocol is being used, if you send 1, 2, 3, and what arrives
> > > What?  What is stated?
> >  From Joe Armstrong's Thesis:
> >
> > "Message passing is assumed to be unreliable with no guarantee of
> > delivery."
> >
> > "Since we made no assumptions about reliable message passing, and must
> > write our application so that it works in the presence of unreliable
> > message passing it should indeed work in the presence of message passing
> > errors. The initial ecort involved will reward us when we try to scale
> > up our systems."
> >
> > "2. Message passing between a pair of processes is assumed to be ordered
> > meaning that if a sequence of messages is sent and received between any
> > pair of processes then the messages will be received in the same order
> > they were sent."
> >
> > "Note that point two is a design decision, and does not reflect any
> > under- lying semantics in the network used to transmit messages. The
> > underlying network might reorder the messages, but between any pair of
> > processes these messages can be buffered, and re-assembled into the
> > correct order before delivery. This assumption makes programming message
> > passing applications much easier than if we had to always allow for out
> > of order messages."
> >
> > ---
> > I read this as saying, messages will be delivered in order, but some may
> > be missing.
> >
> > I'm really interested in this design decision, and how it's
> > implemented.  (I'm also interested in the logic of why it's easier to
> > program around missing messages than out-of-order messages.)
> >
> > Miles
> >
> > --
> > In theory, there is no difference between theory and practice.
> > In practice, there is.  .... Yogi Berra
> >
>
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions



More information about the erlang-questions mailing list