[erlang-questions] question re. message delivery
Peer Stritzinger
peer@REDACTED
Thu Sep 28 12:52:25 CEST 2017
IIRC there is one guarantee in newer versions of Erlang (for a pretty conservative definition of new ;-)
All messages from another node arrive between its nodeup and nodedown message.
This means that you can always detect if there is a possible message loss between two nodes
by monitoring the node.
Or as I understand monitoring a remote process also does this implicitly since a link-down triggers
DOWN messages on all monitored processes across this link and a nodedown
Please correct me if this is not true (modulo the Hans Svensson/Lars-Åke Fredlund paper).
Cheers,
-- Peer
> On 27.09.2017, at 16:42, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
>
> I'll try to summarize what I know on the topic, acting a bit as a ghost
> writer for the VM Team.
>
> See also the old, outdated language specification, which is the best we have.
> It is still the soother that the VM Team sucks on when they do not know
> what else to do, and an updated version is in the pipeline.
> See especially 10.6.2 Order of signals:
>
> http://erlang.org/download/erl_spec47.ps.gz
>
>
> Also see, especially 2.1 Passing of Signals "signals can be lost":
>
> http://erlang.org/doc/apps/erts/communication.html
>
>
> Message order between a pair of processes is guaranteed. I.e. a message
> sent after another message will not be received before that other message.
>
> Messages may be dropped. In particular due to a communication link
> between nodes going down and up.
>
> If you set a monitor (or a link) on a process you will get a 'DOWN'
> message if the other process vanishes i.e. dies or communication link lost.
> That 'DOWN' message is guaranteed to be delivered (the same applies to
> links and 'EXIT' messages).
>
> An example: if process P1 first sets a monitor on process P2 and then
> sends messages M1, M2 and M3 to P2. P2 acknowledges M1 with M1a and M3
> with M3a. Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
> guaranteed to eventually get either M3a or 'DOWN'. If it gets M3a then
> it knows P2 have seen M2 and M3. If it gets 'DOWN' then M2 may have been
> either dropped or seen by P2, the same applies to M3, and P1 may eventually
> get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
>
> Another example: gen_server:call first sets a monitor on the server process,
> then sends the query. By that it knows it will eventually either get
> the reply or 'DOWN'. If it gets 'DOWN' it actually may get a late reply
> (e.g. network down-up), which is often overlooked.
>
> The distribution communication is is per default implemented with TCP links
> between nodes. The VM relies on the distribution transport to deliver
> messages in order, or to die meaning that the link has failed and that any
> number of messages at the end of the sequence may have been dropped.
>
> Process links and monitors towards processes on remote nodes are registered
> in the local node on the distribution channel entry for that remote node,
> so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
> when a communication link goes down. These messages are guaranteed to be
> delivered (if their owner exists).
>
> I hope this clears things up.
> / Raimo Niskanen
>
>
>
> On Tue, Sep 26, 2017 at 05:16:56PM -0700, Miles Fidelman wrote:
>> Hi Joe,
>>
>> Hmmm....
>>
>> Joe Armstrong wrote:
>>
>>> What I said was "message passing is assumed to be reliable"
>>
>>>
>>> The key word here is *assumed* my assumption is that if I open a TCP
>>> socket
>>> and send it five messages numbered 1 to 5 then If I successfully read
>>> message
>>> 5 and have seen no error indicators then I can *assume* that messages 1 to
>>> 4 also arrived in order.
>>>
>>
>> Well yes, but with TCP one has sequence numbers, buffering, and
>> retransmission - and GUARANTEES, by design, that if you (say a socket
>> connection) receive packet 5, then you've also received packets 1-4, in
>> order.
>>
>> My understanding is that Erlang does NOT make that guarantee. As stated:
>>
>> - message delivery is assumed to be UNRELIABLE
>>
>> - ordering is guaranteed to be maintained
>>
>> The implication being that one might well receive packets 1, 2, 3, 5 -
>> and not know that 4 is missing.
>>
>>> Actually I have no idea if this is true - but it does seem to be a
>>> reasonable
>>> assumption.
>>>
>>> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
>>> them and accidentally reordered due to a software bug. An alpha particle
>>> might have hit the data in message 3 and changed it -- who knows?
>>
>>
>> More likely, a TCP connection has dropped, taking a message or two with
>> it, and once the connection is re-established, stuff starts flowing
>> after a gap.
>>
>> With UDP, packets could arrive out of order as well as get dropped.
>>
>> There are ways to extend TCP, or write a higher level protocol that will
>> detect dropped connections, and packets, reconnect, request
>> retransmission - with the result that both the sender & receiver are
>> guaranteed both delivery & order.
>>
>> Which brings us back to implementation.
>>
>>>
>>> Having assumed that message passing is reliable I build code based on
>>> this assumption.
>>
>> But, for Erlang, we can't make this assumption - the documentation
>> specifically says so.
>>
>>>
>>> I'm not, of course, saying that the assumption is true, just that I trust
>>> the
>>> implementers of the system have done a good job to try and make it true.
>>> Certainly any repeatable counter examples should have been investigated
>>> to see if there were any errors in the system.
>>>
>>> All this builds on layers of trust. I trust that erlang message passing is
>>> ordered and reliable in the absence of errors.
>>>
>>> The Erlang implementers trust that TCP is reliable.
>>
>>
>> Well, that is the question, isn't it. Lots of things cause TCP to drop
>> connections. So the question remains - how are dropped connections
>> handled? And, if after a connection is dropped and restored, how are
>> dropped messages and/or messages received out of order handled?
>>
>> Actually, there's another design question in there - in a multi-node
>> Erlang system, maintaining n2 TCP connections seems just a tad
>> unwieldy. Personally, I'd be more likely to use a connectionless
>> protocol, maybe even broadcast.
>>
>>
>>>
>>> The TCP implementors trust that the OS is reliable.
>>>
>>> The OS implementors trust that the processor is reliable.
>>>
>>> The processor implementors trust that the VLSI compilers are correct.
>>>
>>> Software runs on physical machines - so really the laws of physics
>>> apply not
>>> maths. Physics takes into account space and time, and the concept of
>>> simultaneity does not exist, no so in maths.
>>>
>>> It seems to me that software is built upon chains of trust, not upon
>>> mathematical chains of proof.
>>>
>>> I've just been saying "what we want to achieve" and not "how we can
>>> achieve
>>> it".
>>
>> Which brings us back to:
>>
>> stated goals: unreliable delivery, ordered delivery
>>
>> The BEAM Book details how this works within a node, but is silent on how
>> distributed Erlang is implemented. I'm really interested in some details.
>>
>>>
>>> The statements that people make about the system should be in terms
>>> of belief rather than proof.
>>>
>>> I'd say "I believe we have reliable message passing"
>>> It would be plain daft to say "we have reliable message passing" or
>>> "we can prove it be correct" since there is no way of validating this.
>>
>> Sure there is. The state machine model of TCP is very clearly defined,
>> including its various error conditions. And one can test an
>> implementation for adherence to the state machine model. (In some
>> cases, one can also demonstrate that software is provably correct - but
>> let's not go there).
>>
>>
>>>
>>> Call me old fashioned but I think that claims that, for example,
>>> "we have unlimited storage" and so on are just nuts ...
>>
>> Agreed. But claims like "when allocated storage reaches 80% use,
>> additional storage is allocated by <mechanism>" are not just reasonable,
>> but mandatory when designing systems that have to scale under uncertain
>> load.
>>
>> Which brings us back to - how is message passing implemented between
>> Erlang nodes?
>>
>> Cheers,
>>
>> Miles
>>
>> --
>> In theory, there is no difference between theory and practice.
>> In practice, there is. .... Yogi Berra
>>
>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list