[erlang-questions] Why should sending somtimes fail

Mon Mar 1 11:33:07 CET 2010

Back in the late Cretaceous Period, when dinosaurs like the
mighty AXD 301 roamed the earth, this change was actually
attempted, for the very reason you suggest. It was rolled back,
in part because it hadn't been property communicated (even within
the OTP team), but also because changing this could cause code
to hang indefinitely rather than crashing in certain circumstances.

I recall the discussion well. While it was argued that it was
fundamentally a race condition anyway, and the badarg if the
process wasn't registered was just one of many possibilities -
the receiving process could be alive when the message arrived
but die before handling it, etc., the counter-argument was
that the change made things much worse for some pretty common
error cases.

The clincher was that supervisor:which_children/1 was implemented
using gen_server:call(..., infinity), and it could hang forever
with the new semantics. Given that this behaviour was manifest
even in core OTP code, it was agreed that something else was
needed before this sort of change could possibly be attempted.

That something else was monitors. Back then, Erlang had only
2-way links, which also had the unfortunate characteristic
that calling link/1 several times would still result in only
one link, but calling unlink/1 even once would remove it.
This made it very difficult to use for temporary monitoring
of an inter-process dialogue, as you had to introduce some
reference-counting wrapper around link/unlink to be safe.

One of the touted advantages of letting send crash on
unregistered name was that it provided a fast error
detection for at least the most common cases, rather than
having to always fall back on a long timeout*. Monitors
offer the same advantage, while being stackable and one-
way, which is exactly what you want in this case.

It would seem as if it might be possible to have another
stab at this change, but this time, it would need to be
done with an EEP and a very careful study of the worst-
case consequences and perhaps some tutorial that describes
how to find code that needs to be rewritten, and how to
fix it.

* The unfortunate consequence would then be that the timeout
would have to be long enough not to give false positives,
and short enough not to give ridiculously long error detection
time if the server side had crashed.

BR,
Ulf W

Johan Montelius wrote:
> 
> One thing that I do like in the operational semantics of Erlang is that 
> sending a message will not raise an exception even if the receiver is no 
> longer alive. It is also very convenient to send directly to a 
> registered process by using the atom. However, sending to an atom will 
> now raise an exception if there is no process registered under that 
> name. Sending to an remote registered process does however not raise an 
> exception.
> 
> Could it not be a good idea to skip the exception even if there is no 
> process currently registered under the name. In doing so one would have 
> the same operational semantics for all four cases: local pid, remote 
> pid, local registered, remote registered.
> 
>   Johan
> 

-- 
Ulf Wiger
CTO, Erlang Solutions Ltd, formerly Erlang Training & Consulting Ltd
http://www.erlang-solutions.com
---------------------------------------------------

---------------------------------------------------

WE'VE CHANGED NAMES!

Since January 1st 2010 Erlang Training and Consulting Ltd. has become ERLANG SOLUTIONS LTD.

www.erlang-solutions.com