some basic questions
Hakan Mattsson
hakan@REDACTED
Tue Oct 12 11:48:41 CEST 1999
On Mon, 11 Oct 1999, Per Hedeland wrote:
per> As with link/1.:-) However I think perhaps that this "library function
per> making a call to a server" is the actual problem, and I'm not absolutely
per> convinced that monitor fully solves it: The intent in this case is to
per> just monitor a single query/reply exchange (as opposed to two processes
per> having a long-standing link/monitor "relationship"), and to be truly
per> useful for this, monitor has to work across nodes of course (seems it
per> does in "R6"), which means that a message must be sent to the remote
per> node, and another for demonitor - and this is comparatively expensive,
per> as you suggested (link has the same problem, of course).
per>
per> So perhaps there really is a need for something like monitor_send() -
per> semantics could be (after ca 5 minutes of thinking about it:-) that the
per> sending of the message itself started the monitoring, which was canceled
per> as soon as the receiver sent a message back to the same process. But I'm
per> sure those actually working on the implementation (or actually writing
per> complex Erlang programs:-) have thought longer and harder about this
per> than I have - comments appreciated...:-)
This seems like a good approach if it is implemented correctly.
For processed with registered names, it is essential that the
name is resolved to the same process identifier both for the
delivery of the message and for monitoring of the process.
Sometime ago I had lots of trouble with "unsafe" calls between named
servers on different nodes. The problem turned out to be gen_server,
which naivly sent the message to the named process, waited for a while
and then resolved the name again to see if the process was alive. If
it was alive it waited forever for the message (Timeout=infinity).
Unfortunately my named process died and a new one was restarted with
the same name. This lead to a situation where my message was delivered
to one process and but the client was monitoring another one.
Internally in Mnesia, named processes are used a lot and in order to
make the "unsafe" gen_server calls "safe" we have used two different
approaches depending on the circumstances:
o At startup of Mnesia (after negotiating about which communication
protocol to use) a local monitor process is linked to monitor
processes on all other database nodes. The aim of the monitor
process is to monitor various resources such as Mnesia on remote
nodes, local dets and disk_log processes etc. (Luckily Mnesia
has no need for monitoring of other applications.)
When an inter-monitor-link breaks, the monitor forwards internal
{mnesia_down, Node} messages to the processes that needs to be
informed about this. Besides Mnesia's internal processes the message
is forwarded to user processes, such as transaction coordinators.
This mechanism works very well for communcication protocols which
involves longer dialogs. It is both reliable and fast. The tricky
part has been to avoid sending spurious {mnesia_down, Node} messages
in the cases when the server first replies to the message and then
dies shortly afterwards.
In http://www.ericsson.se/cslab/~hakan/mnesia_internals_slides.pdf you
will find some architectural pictures covering these issues.
o Especially during the startup of Mnesia there are a various
of interesting concurrency situations that may occur. The startup
may be performed simultaneously on several nodes, possible with
different pace and different degree of success. Named servers may
crash and be restarted under the same name.
In order to make the startup safe (despite of gen_server) the
gen_server calls are sometimes wrapped by explicitly linking
to the process BEFORE message delivery and unlinking it when
the reply has arrived. It is reliable but costly.
/Håkan
More information about the erlang-questions
mailing list