some basic questions

Mon Oct 11 11:33:27 CEST 1999

Ulf Wiger wrote:
>Yes, but unfortunately, link/1 hasn't been that useful in monitoring
>process communication. In fact, the one thing link does very well is
>cascading exits (that is, if one process dies, all processes linked to it
>will die.)

Right, it was part of the original (intentionally) "simplistic" Erlang
error handling philosophy: Trying to do a lot of finegrained error
handling is usually counterproductive, at least in the typical
applications Erlang was intended for - the important thing is not to
find out exactly what the problem was (with a lot of code that may
actually *introduce* errors, especially since it rarely gets tested
well), but to recover gracefully (abandoning/restarting some predefined
"minimal" part of the system) and keep on running.

But of course the real world requirements sometimes are stricter or more
complex than that, and I didn't really mean to imply that monitor/2 was
unneeded - rather to point out to anyone that might have missed it that
there already *is* *one* mechanism to do "super-safe" communication,
even if it may have some drawbacks.

>There are a couple of things "wrong" with link:

And I can add one (it too is "wrong", not wrong:-):

- It produces EXIT signals, rather than normal messages. I.e. if you
  don't want the issuing process to die when the communication fails, it
  has to trap exits - and trapping exits in a library function (that
  e.g. makes a "call" to a server) is problematic or at least error-
  prone - if the calling process wasn't trapping exits, you have to deal
  correctly with any *other* EXIT signals that happen to arrive while
  you are waiting for the reply (and hope that you can differentiate
  them from the one you were looking for:-).

>- It's two-way, which means that if you link to another process,
>  you may cause trouble for that process if "you" die -- typically,
>  a server may not have to trap exits, but if clients start 
>  linking to it, it may have to in order not to die when they die.

Well, maybe a server should always trap exits, to make sure it never
dies - but not every process you want "guaranteed communication" with is
necessarily a server, of course.

>The problem in real-time systems is that even if the server process dies
>before reading the message, you must wait for the length of the timeout to
>find out that something went wrong. Using monitor/1, you will find out
>immediately.

As with link/1.:-) However I think perhaps that this "library function
making a call to a server" is the actual problem, and I'm not absolutely
convinced that monitor fully solves it: The intent in this case is to
just monitor a single query/reply exchange (as opposed to two processes
having a long-standing link/monitor "relationship"), and to be truly
useful for this, monitor has to work across nodes of course (seems it
does in "R6"), which means that a message must be sent to the remote
node, and another for demonitor - and this is comparatively expensive,
as you suggested (link has the same problem, of course).

So perhaps there really is a need for something like monitor_send() -
semantics could be (after ca 5 minutes of thinking about it:-) that the
sending of the message itself started the monitoring, which was canceled
as soon as the receiver sent a message back to the same process. But I'm
sure those actually working on the implementation (or actually writing
complex Erlang programs:-) have thought longer and harder about this
than I have - comments appreciated...:-)

--Per