[erlang-questions] What happens in Erlang if return receipt never arrives?

Tim Watson watson.timothy@REDACTED
Fri Jul 5 15:27:21 CEST 2013

What we're talking about here are byzantine failures, and despite a huge body of research on these topics, there are no fool-proof answers or solutions.

Erlang does not use ACKs implicitly - sending a message to a process is an asynchronous (fire and forget) action and no implicit reply channel exists. This is a deliberate design decision, since making various levels of guarantee about delivery is not a core requirement, but rather, an application specific need.

You ask what "happens if the receiver is not able to notify the world about its death", and this is handled in two ways by erlang's runtime system. If the receiver is a local process (i.e., resides on the same physical emulator) then as soon as it exits, any monitors that have been set up (by other processes) will be triggered and a message sent to the subscribers. If the receiver resides on another node (whether physically on the same host or not) then monitors are managed at both ends and when the remote process dies, monitor notifications are sent across the wire to subscribers. If the receiver resides on another node and the network link dies, then the net_kernel's tick timer will detect the disconnect (after a default timeout) and once it considers the node to have been disconnected, will fire monitor notifications to all (local) subscribers.

In terms of guarantees about delivery, in practise these are impossible to make without certain caveats. Guaranteed "exactly once" delivery, for example, is a promise that cannot be made without theoretically infinite storage capacity being available for all participants. In practise one can "get away with" finite storage capacity, but that means under certain circumstances, the promise/guarantee won't hold.

The way to 'ensure' a message has arrived across a distribution link then, is to expect an ACK. You must wait a potentially infinite time until that ACK arrives before considering the message delivered, unless you see a monitor notification indicating the receiver's death, at which point you might re-send once the receiver comes back online. Even then, you cannot be sure if the receiver did in fact see the message *and* sent an ACK, but the network link died before the ACK was delivered. In this case, you'll have to re-send anyway - since you've no way of knowing message was actually delivered, having not seen the lost ACK - and the receiver will have to de-dup the message(s) at his end.

Other configurations are possible. Introducing a ring overlay, you can achieve atomic broadcast with two round trips, one for delivery and another for ACK, though you'd have to either have each node take responsibility for its neighbours messages (and ACKs) during group membership changes, or make group membership changes atomic by using distributed transactions (e.g., paxos, etc). There is a definite cost to adding these kind of guarantees.

Viz your observation about wanting to make delivery atomic, in practise many solutions go the asynchronous route in order to avoid the performance (and latency) overheads of distributed transactions.

If you want to use ACKs in your erlang code, you can either implement this yourself or use gen_server:call to make a synchronous invocation that waits on a response (and monitors the server). This will not only throttle the producer, but will also make the server a bottleneck in the system, since it can only handle one message at once.


On 5 Jul 2013, at 12:23, jeti789@REDACTED wrote:

> I see. What happens if the receiver is not able to notify the world about its death? The sender after a timeout sends a notification out about a possible death? The supervisor detects that there is no more heart beat? Not wanting to be pedantic. Just trying to understand how this works. Is there some document/book where you can read about those things. I like that kind of problems ;-).
> Regards, Oliver
> In practice you just subscribe to notification of receivers death. It
> does not solve a case when your receiver is overflowed with work, but
> it is solved by careful planning of flow of messages in your
> application.
> On Fri, Jul 5, 2013 at 10:27 AM, <jeti789@REDACTED> wrote:
> > I just happened to read the thesis of Joe Armstrong and don't have much
> > prior knowledge of Erlang. I wonder what happens if a delivery receipt for
> > some message never arrives. What does the sending actor do? It sends the
> > message another time? This could confuse the recipient actor when it
> > receives the same message another time. It has to be able to tell that its
> > receipt was not received and therefore the second message is void.
> >
> > That kind of problems always kept me away from solutions where message
> > delivery is not transactional. I think I know the answer: the sending actor
> > tells its supervising actor that something must be wrong when it didn't
> > obtain a receipt in reasonable time causing the supervisor to take some
> > action (like restarting the involed actors or something). Is this correct? I
> > see no other solution that doesn't result in theroretically possible
> > infinite message sends.
> >
> > Thanks for any answer,
> >
> > Bienlein
> >
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> >
> --
> Best regards,
> Paul Peregud
> +48602112091
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130705/28d7cdba/attachment.htm>

More information about the erlang-questions mailing list