[erlang-questions] question re. message delivery

Wed Sep 27 18:10:45 CEST 2017

Raimo,

Thanks for the details - exactly what I was looking for!

A few questions - inline, near the end...

On 9/27/17 7:42 AM, Raimo Niskanen wrote:
> I'll try to summarize what I know on the topic, acting a bit as a ghost
> writer for the VM Team.
>
> See also the old, outdated language specification, which is the best we have.
> It is still the soother that the VM Team sucks on when they do not know
> what else to do, and an updated version is in the pipeline.
> See especially 10.6.2 Order of signals:
>
>      http://erlang.org/download/erl_spec47.ps.gz
>
>
> Also see, especially 2.1 Passing of Signals "signals can be lost":
>
>      http://erlang.org/doc/apps/erts/communication.html
>
>
> Message order between a pair of processes is guaranteed.  I.e. a message
> sent after another message will not be received before that other message.
>
> Messages may be dropped.  In particular due to a communication link
> between nodes going down and up.
>
> If you set a monitor (or a link) on a process you will get a 'DOWN'
> message if the other process vanishes i.e. dies or communication link lost.
> That 'DOWN' message is guaranteed to be delivered (the same applies to
> links and 'EXIT' messages).

Ahh.. now that's important.  (One might quibble about the semantics of 
"DOWN" vs. "not reachable," but that's another topic.)

>
> An example: if process P1 first sets a monitor on process P2 and then
> sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
> with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
> guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
> it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
> either dropped or seen by P2, the same applies to M3, and P1 may eventually
> get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
>
> Another example: gen_server:call first sets a monitor on the server process,
> then sends the query.  By that it knows it will eventually either get
> the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
> (e.g. network down-up), which is often overlooked.
>
> The distribution communication is is per default implemented with TCP links
> between nodes.  The VM relies on the distribution transport to deliver
> messages in order, or to die meaning that the link has failed and that any
> number of messages at the end of the sequence may have been dropped.

Are those TCP links created on the fly, between processes, or are they 
kept running between nodes?

I ask, because there's an obvious n2 issue in a system with lots of 
nodes.  (I used to work on protocols linking distributed simulators - 
we'd either use IP multicast to distributed state updates, or establish 
sparser network "above" TCP - n2 links between servers on each local 
network, broadcast on LANs).

>
> Process links and monitors towards processes on remote nodes are registered
> in the local node on the distribution channel entry for that remote node,
> so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
> when a communication link goes down.  These messages are guaranteed to be
> delivered (if their owner exists).
>
> I hope this clears things up.
> / Raimo Niskanen

Yes.  Thanks!

Miles

>
>
>
> On Tue, Sep 26, 2017 at 05:16:56PM -0700, Miles Fidelman wrote:
>> Hi Joe,
>>
>> Hmmm....
>>
>> Joe Armstrong wrote:
>>
>>> What I said was "message passing is assumed to be reliable"
>>> The key word here is *assumed* my assumption is that if I open a TCP
>>> socket
>>> and send it five messages numbered 1 to 5 then If I successfully read
>>> message
>>> 5 and have seen no error indicators then I can *assume* that messages 1 to
>>> 4 also arrived in order.
>>>
>> Well yes, but with TCP one has sequence numbers, buffering, and
>> retransmission - and GUARANTEES, by design, that if you (say a socket
>> connection) receive packet 5, then you've also received packets 1-4, in
>> order.
>>
>> My understanding is that Erlang does NOT make that guarantee.  As stated:
>>
>> - message delivery is assumed to be UNRELIABLE
>>
>> - ordering is guaranteed to be maintained
>>
>> The implication being that one might well receive packets 1, 2, 3, 5 -
>> and not know that 4 is missing.
>>
>>> Actually I have no idea if this is true - but it does seem to be a
>>> reasonable
>>> assumption.
>>>
>>> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
>>> them and accidentally reordered due to a software bug. An alpha particle
>>> might have hit the data in message 3 and changed it -- who knows?
>>
>> More likely, a TCP connection has dropped, taking a message or two with
>> it, and once the connection is re-established, stuff starts flowing
>> after a gap.
>>
>> With UDP, packets could arrive out of order as well as get dropped.
>>
>> There are ways to extend TCP, or write a higher level protocol that will
>> detect dropped connections, and packets, reconnect, request
>> retransmission - with the result that both the sender & receiver are
>> guaranteed both delivery & order.
>>
>> Which brings us back to implementation.
>>
>>> Having assumed that message passing is reliable I build code based on
>>> this assumption.
>> But, for Erlang, we can't make this assumption - the documentation
>> specifically says so.
>>
>>> I'm not, of course, saying that the assumption is true, just that I trust
>>> the
>>> implementers of the system have done a good job to try and make it true.
>>> Certainly any repeatable counter examples should have been investigated
>>> to see if there were any errors in the system.
>>>
>>> All this builds on layers of trust. I trust that erlang message passing is
>>> ordered and reliable in the absence of errors.
>>>
>>> The Erlang implementers trust that TCP is reliable.
>>
>> Well, that is the question, isn't it.  Lots of things cause TCP to drop
>> connections.  So the question remains - how are dropped connections
>> handled?  And, if after a connection is dropped and restored, how are
>> dropped messages and/or messages received out of order handled?
>>
>> Actually, there's another design question in there - in a multi-node
>> Erlang system, maintaining n2 TCP connections seems just a tad
>> unwieldy.  Personally, I'd be more likely to use a connectionless
>> protocol, maybe even broadcast.
>>
>>
>>> The TCP implementors trust that the OS is reliable.
>>>
>>> The OS implementors trust that the processor is reliable.
>>>
>>> The processor implementors trust that the VLSI compilers are correct.
>>>
>>> Software runs on physical machines - so really the laws of physics
>>> apply not
>>> maths. Physics takes into account space and time, and the concept of
>>> simultaneity does not exist, no so in maths.
>>>
>>> It seems to me that software is built upon chains of trust, not upon
>>> mathematical chains of proof.
>>>
>>> I've just been saying "what we want to achieve" and not "how we can
>>> achieve
>>> it".
>> Which brings us back to:
>>
>> stated goals:  unreliable delivery, ordered delivery
>>
>> The BEAM Book details how this works within a node, but is silent on how
>> distributed Erlang is implemented.  I'm really interested in some details.
>>
>>> The statements that people make about the system should be in terms
>>> of belief rather than proof.
>>>
>>> I'd say "I believe we have reliable message passing"
>>> It would be plain daft to say "we have reliable message passing" or
>>> "we can prove it be correct" since there is no way of validating this.
>> Sure there is.  The state machine model of TCP is very clearly defined,
>> including its various error conditions.  And one can test an
>> implementation for adherence to the state machine model.  (In some
>> cases, one can also demonstrate that software is provably correct - but
>> let's not go there).
>>
>>
>>> Call me old fashioned but I think that claims that, for example,
>>> "we have unlimited storage" and so on are just nuts ...
>> Agreed.  But claims like "when allocated storage reaches 80% use,
>> additional storage is allocated by <mechanism>" are not just reasonable,
>> but mandatory when designing systems that have to scale under uncertain
>> load.
>>
>> Which brings us back to - how is message passing implemented between
>> Erlang nodes?
>>
>> Cheers,
>>
>> Miles
>>
>> -- 
>> In theory, there is no difference between theory and practice.
>> In practice, there is.  .... Yogi Berra
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>

-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra