some basic questions

Tue Oct 12 11:48:41 CEST 1999

On Mon, 11 Oct 1999, Per Hedeland wrote:

per> As with link/1.:-) However I think perhaps that this "library function
per> making a call to a server" is the actual problem, and I'm not absolutely
per> convinced that monitor fully solves it: The intent in this case is to
per> just monitor a single query/reply exchange (as opposed to two processes
per> having a long-standing link/monitor "relationship"), and to be truly
per> useful for this, monitor has to work across nodes of course (seems it
per> does in "R6"), which means that a message must be sent to the remote
per> node, and another for demonitor - and this is comparatively expensive,
per> as you suggested (link has the same problem, of course).
per>
per> So perhaps there really is a need for something like monitor_send() -
per> semantics could be (after ca 5 minutes of thinking about it:-) that the
per> sending of the message itself started the monitoring, which was canceled
per> as soon as the receiver sent a message back to the same process. But I'm
per> sure those actually working on the implementation (or actually writing
per> complex Erlang programs:-) have thought longer and harder about this
per> than I have - comments appreciated...:-)

This seems like a good approach if it is implemented correctly.
For processed with registered names, it is essential that the
name is resolved to the same process identifier both for the 
delivery of the message and for monitoring of the process.

Sometime ago I had lots of trouble with "unsafe" calls between named
servers on different nodes. The problem turned out to be gen_server,
which naivly sent the message to the named process, waited for a while
and then resolved the name again to see if the process was alive. If
it was alive it waited forever for the message (Timeout=infinity).
Unfortunately my named process died and a new one was restarted with
the same name. This lead to a situation where my message was delivered
to one process and but the client was monitoring another one.

Internally in Mnesia, named processes are used a lot and in order to
make the "unsafe" gen_server calls "safe" we have used two different
approaches depending on the circumstances:

  o At startup of Mnesia (after negotiating about which communication
    protocol to use) a local monitor process is linked to monitor
    processes on all other database nodes. The aim of the monitor
    process is to monitor various resources such as Mnesia on remote
    nodes, local dets and disk_log processes etc. (Luckily Mnesia
    has no need for monitoring of other applications.)

    When an inter-monitor-link breaks, the monitor forwards internal
    {mnesia_down, Node} messages to the processes that needs to be
    informed about this. Besides Mnesia's internal processes the message
    is forwarded to user processes, such as transaction coordinators.

    This mechanism works very well for communcication protocols which
    involves longer dialogs. It is both reliable and fast. The tricky
    part has been to avoid sending spurious {mnesia_down, Node} messages
    in the cases when the server first replies to the message and then
    dies shortly afterwards.

    In http://www.ericsson.se/cslab/~hakan/mnesia_internals_slides.pdf you
    will find some architectural pictures covering these issues.

  o Especially during the startup of Mnesia there are a various
    of interesting concurrency situations that may occur. The startup
    may be performed simultaneously on several nodes, possible with
    different pace and different degree of success. Named servers may
    crash and be restarted under the same name.

    In order to make the startup safe (despite of gen_server) the
    gen_server calls are sometimes wrapped by explicitly linking
    to the process BEFORE message delivery and unlinking it when
    the reply has arrived. It is reliable but costly.

/Håkan