[erlang-questions] question re. message delivery

Thu Sep 28 10:02:35 CEST 2017

Miles; great to hear!  Replies inline.

On Wed, Sep 27, 2017 at 09:10:45AM -0700, Miles Fidelman wrote:
> Raimo,
> 
> Thanks for the details - exactly what I was looking for!
> 
> A few questions - inline, near the end...
> 
> 
> On 9/27/17 7:42 AM, Raimo Niskanen wrote:
> > I'll try to summarize what I know on the topic, acting a bit as a ghost
> > writer for the VM Team.
> >
> > See also the old, outdated language specification, which is the best we have.
> > It is still the soother that the VM Team sucks on when they do not know
> > what else to do, and an updated version is in the pipeline.
> > See especially 10.6.2 Order of signals:
> >
> >      http://erlang.org/download/erl_spec47.ps.gz
> >
> >
> > Also see, especially 2.1 Passing of Signals "signals can be lost":
> >
> >      http://erlang.org/doc/apps/erts/communication.html
> >
> >
> > Message order between a pair of processes is guaranteed.  I.e. a message
> > sent after another message will not be received before that other message.
> >
> > Messages may be dropped.  In particular due to a communication link
> > between nodes going down and up.
> >
> > If you set a monitor (or a link) on a process you will get a 'DOWN'
> > message if the other process vanishes i.e. dies or communication link lost.
> > That 'DOWN' message is guaranteed to be delivered (the same applies to
> > links and 'EXIT' messages).
> 
> Ahh.. now that's important.  (One might quibble about the semantics of 
> "DOWN" vs. "not reachable," but that's another topic.)

"not reachable" here means link down, which for TCP means socket error or
closed, or a link tick timeout.  The VM sends ticks on every link if no
other data is sent to ensure that data flows, and if not the link is closed.

If the process dies in the remote VM, the latter produces a 'DOWN' message.
That message may be dropped, but if so it is due to link down and then
the local VM produces a different 'DOWN' message because there is a monitor
on a process in the node at the other end of the link that went down.

> 
> >
> > An example: if process P1 first sets a monitor on process P2 and then
> > sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
> > with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
> > guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
> > it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
> > either dropped or seen by P2, the same applies to M3, and P1 may eventually
> > get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
> >
> > Another example: gen_server:call first sets a monitor on the server process,
> > then sends the query.  By that it knows it will eventually either get
> > the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
> > (e.g. network down-up), which is often overlooked.
> >
> > The distribution communication is is per default implemented with TCP links
> > between nodes.  The VM relies on the distribution transport to deliver
> > messages in order, or to die meaning that the link has failed and that any
> > number of messages at the end of the sequence may have been dropped.
> 
> Are those TCP links created on the fly, between processes, or are they 
> kept running between nodes?

Today a link is between nodes, created on the fly and kept running.
If you bang a process on another node the link is brought up, and then it is
kept up forever, until net_adm:disconnect(Node), until it fails or is closed
from the other end.

I do not think we can have one link per process or process pair - that
would exhaust OS resources.  Possibly multiple links between nodes, but
that would also be pushing it.

Closing idle links could be useful to save resources, though.

> 
> I ask, because there's an obvious n2 issue in a system with lots of 
> nodes.  (I used to work on protocols linking distributed simulators - 
> we'd either use IP multicast to distributed state updates, or establish 
> sparser network "above" TCP - n2 links between servers on each local 
> network, broadcast on LANs).

Yes there is an n^2 problem.  We are contemplating mitigations.  But today
global (the global name registry) does what it can to create a fully
connected network.

We are looking into replacing global with some DHT based registry to avoid
the fully connected network, and into closing idle links.

Routing of links through other nodes has also been discussed.

There is also the option to attach as a "hidden" node that will not cause
secondary connections to be set up.

This topic often comes up at Erlang conferences since it is a limitation of
how many nodes that are feasible in a network, and results in all kinds of
interesting solutions, workarounds and suggestions.

> 
> 
> >
> > Process links and monitors towards processes on remote nodes are registered
> > in the local node on the distribution channel entry for that remote node,
> > so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
> > when a communication link goes down.  These messages are guaranteed to be
> > delivered (if their owner exists).
> >
> > I hope this clears things up.
> > / Raimo Niskanen
> 
> Yes.  Thanks!
> 
> Miles

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB