[erlang-questions] question re. message delivery

Miles Fidelman mfidelman@REDACTED
Wed Sep 27 02:16:56 CEST 2017


Hi Joe,

Hmmm....

Joe Armstrong wrote:

> What I said was "message passing is assumed to be reliable"

>
> The key word here is *assumed* my assumption is that if I open a TCP 
> socket
> and send it five messages numbered 1 to 5 then If I successfully read
> message
> 5 and have seen no error indicators then I can *assume* that messages 1 to
> 4 also arrived in order.
>

Well yes, but with TCP one has sequence numbers, buffering, and 
retransmission - and GUARANTEES, by design, that if you (say a socket 
connection) receive packet 5, then you've also received packets 1-4, in 
order.

My understanding is that Erlang does NOT make that guarantee.  As stated:

- message delivery is assumed to be UNRELIABLE

- ordering is guaranteed to be maintained

The implication being that one might well receive packets 1, 2, 3, 5 - 
and not know that 4 is missing.

> Actually I have no idea if this is true - but it does seem to be a
> reasonable
> assumption.
>
> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
> them and accidentally reordered due to a software bug. An alpha particle
> might have hit the data in message 3 and changed it -- who knows?


More likely, a TCP connection has dropped, taking a message or two with 
it, and once the connection is re-established, stuff starts flowing 
after a gap.

With UDP, packets could arrive out of order as well as get dropped.

There are ways to extend TCP, or write a higher level protocol that will 
detect dropped connections, and packets, reconnect, request 
retransmission - with the result that both the sender & receiver are 
guaranteed both delivery & order.

Which brings us back to implementation.

>
> Having assumed that message passing is reliable I build code based on
> this assumption.

But, for Erlang, we can't make this assumption - the documentation 
specifically says so.

>
> I'm not, of course, saying that the assumption is true, just that I trust
> the
> implementers of the system have done a good job to try and make it true.
> Certainly any repeatable counter examples should have been investigated
> to see if there were any errors in the system.
>
> All this builds on layers of trust. I trust that erlang message passing is
> ordered and reliable in the absence of errors.
>
> The Erlang implementers trust that TCP is reliable.


Well, that is the question, isn't it.  Lots of things cause TCP to drop 
connections.  So the question remains - how are dropped connections 
handled?  And, if after a connection is dropped and restored, how are 
dropped messages and/or messages received out of order handled?

Actually, there's another design question in there - in a multi-node 
Erlang system, maintaining n2 TCP connections seems just a tad 
unwieldy.  Personally, I'd be more likely to use a connectionless 
protocol, maybe even broadcast.


>
> The TCP implementors trust that the OS is reliable.
>
> The OS implementors trust that the processor is reliable.
>
> The processor implementors trust that the VLSI compilers are correct.
>
> Software runs on physical machines - so really the laws of physics 
> apply not
> maths. Physics takes into account space and time, and the concept of
> simultaneity does not exist, no so in maths.
>
> It seems to me that software is built upon chains of trust, not upon
> mathematical chains of proof.
>
> I've just been saying "what we want to achieve" and not "how we can 
> achieve
> it".

Which brings us back to:

stated goals:  unreliable delivery, ordered delivery

The BEAM Book details how this works within a node, but is silent on how 
distributed Erlang is implemented.  I'm really interested in some details.

>
> The statements that people make about the system should be in terms
> of belief rather than proof.
>
> I'd say "I believe we have reliable message passing"
> It would be plain daft to say "we have reliable message passing" or
> "we can prove it be correct" since there is no way of validating this.

Sure there is.  The state machine model of TCP is very clearly defined, 
including its various error conditions.  And one can test an 
implementation for adherence to the state machine model.  (In some 
cases, one can also demonstrate that software is provably correct - but 
let's not go there).


>
> Call me old fashioned but I think that claims that, for example,
> "we have unlimited storage" and so on are just nuts ...

Agreed.  But claims like "when allocated storage reaches 80% use, 
additional storage is allocated by <mechanism>" are not just reasonable, 
but mandatory when designing systems that have to scale under uncertain 
load.

Which brings us back to - how is message passing implemented between 
Erlang nodes?

Cheers,

Miles

-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170926/2d99b1a2/attachment.htm>


More information about the erlang-questions mailing list