[erlang-questions] question re. message delivery
Wed Sep 27 02:16:56 CEST 2017
Joe Armstrong wrote:
> What I said was "message passing is assumed to be reliable"
> The key word here is *assumed* my assumption is that if I open a TCP
> and send it five messages numbered 1 to 5 then If I successfully read
> 5 and have seen no error indicators then I can *assume* that messages 1 to
> 4 also arrived in order.
Well yes, but with TCP one has sequence numbers, buffering, and
retransmission - and GUARANTEES, by design, that if you (say a socket
connection) receive packet 5, then you've also received packets 1-4, in
My understanding is that Erlang does NOT make that guarantee. As stated:
- message delivery is assumed to be UNRELIABLE
- ordering is guaranteed to be maintained
The implication being that one might well receive packets 1, 2, 3, 5 -
and not know that 4 is missing.
> Actually I have no idea if this is true - but it does seem to be a
> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
> them and accidentally reordered due to a software bug. An alpha particle
> might have hit the data in message 3 and changed it -- who knows?
More likely, a TCP connection has dropped, taking a message or two with
it, and once the connection is re-established, stuff starts flowing
after a gap.
With UDP, packets could arrive out of order as well as get dropped.
There are ways to extend TCP, or write a higher level protocol that will
detect dropped connections, and packets, reconnect, request
retransmission - with the result that both the sender & receiver are
guaranteed both delivery & order.
Which brings us back to implementation.
> Having assumed that message passing is reliable I build code based on
> this assumption.
But, for Erlang, we can't make this assumption - the documentation
specifically says so.
> I'm not, of course, saying that the assumption is true, just that I trust
> implementers of the system have done a good job to try and make it true.
> Certainly any repeatable counter examples should have been investigated
> to see if there were any errors in the system.
> All this builds on layers of trust. I trust that erlang message passing is
> ordered and reliable in the absence of errors.
> The Erlang implementers trust that TCP is reliable.
Well, that is the question, isn't it. Lots of things cause TCP to drop
connections. So the question remains - how are dropped connections
handled? And, if after a connection is dropped and restored, how are
dropped messages and/or messages received out of order handled?
Actually, there's another design question in there - in a multi-node
Erlang system, maintaining n2 TCP connections seems just a tad
unwieldy. Personally, I'd be more likely to use a connectionless
protocol, maybe even broadcast.
> The TCP implementors trust that the OS is reliable.
> The OS implementors trust that the processor is reliable.
> The processor implementors trust that the VLSI compilers are correct.
> Software runs on physical machines - so really the laws of physics
> apply not
> maths. Physics takes into account space and time, and the concept of
> simultaneity does not exist, no so in maths.
> It seems to me that software is built upon chains of trust, not upon
> mathematical chains of proof.
> I've just been saying "what we want to achieve" and not "how we can
Which brings us back to:
stated goals: unreliable delivery, ordered delivery
The BEAM Book details how this works within a node, but is silent on how
distributed Erlang is implemented. I'm really interested in some details.
> The statements that people make about the system should be in terms
> of belief rather than proof.
> I'd say "I believe we have reliable message passing"
> It would be plain daft to say "we have reliable message passing" or
> "we can prove it be correct" since there is no way of validating this.
Sure there is. The state machine model of TCP is very clearly defined,
including its various error conditions. And one can test an
implementation for adherence to the state machine model. (In some
cases, one can also demonstrate that software is provably correct - but
let's not go there).
> Call me old fashioned but I think that claims that, for example,
> "we have unlimited storage" and so on are just nuts ...
Agreed. But claims like "when allocated storage reaches 80% use,
additional storage is allocated by <mechanism>" are not just reasonable,
but mandatory when designing systems that have to scale under uncertain
Which brings us back to - how is message passing implemented between
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions