[erlang-questions] Restarting processes

Thu Mar 29 11:42:06 CEST 2007

In my application I have a demultiplexing process and few worker
processes. The workers subscribe to events, the demultiplexor gets
requests from the network, and distributes them to the workers, which
handle them as appropiate.

The demultiplexor monitors (erlang:monitor) the workers, so when a
worker terminates without unsubscribing, the demultiplexor cleans the
subscription data.

Now, what if the demultiplexor itself die? Ok, demultiplexor's
supervisor will restart it. But what about the workers? I don't want to
terminate and restart them, because workers carry quite a few state and
I don't want that state to be lost. So, I need my workers to resubscribe
to restarted demux. How can I handle it nicely?

I can have workers to monitor the demux. It solves a part of the
problem: the workers would be notified if old demux die and try to
resubscribe to new demux getting it's pid via
whereis(demux_registered_name). But what if a worker get {'DOWN', ...}
message and tries to resubscribe, but the demux isn't restarted yet or
haven't finished the init stage? Cleary a race contidition. I can
introduce a small delay after the {'DOWN, ...} message and attempted
resubscription, but this approach seems to be a bit ugly.

What I want to do is to have a some way for demux to announce "Hey, I've
just restarted and forgot all my subscribers. Now I'm operational again.
Whoever interested may resubscribe". Supervisors seem to be a natural
choice to relay that king of messages, since they are always there. But
it appears that OTP supervisors cannot do that. How do you handle this
kind of problems in your applications?

-- 
dg