[erlang-questions] Distributed application takeover

Wed Mar 30 14:11:59 CEST 2011

Hi,

I'm trying to find out how the distributed application controller works
internally. I'm especially interested in the implementation of an
application takeover.

In case an application runs on node A, and is taken over by node B, what
should happen is that it is first started on node B, so that there are two
instances of the application running simultaneously for a brief period of
time, and then stopped on node A.

However, I cannot figure out where this stopping happens in dist_ac.erl. If
I understand correctly, this should happen in response to a
ac_application_run message from the application_controller. This message is
received by the dist_ac on node B, and a dist_ac_app_started message is then
broadcast to the dist_acs on all connected nodes. The dist_ac of node A
receives this message, notices that the application is still running
locally, and decides to shut down the application on its own node -- at
least that is what the comments say (dist_ac.erl, line 529):

%% Another node tookover from me; stop my application
%% and update the running list.

But all I can see is that the dist_ac's list of applications is updated to
indicate that the application is no longer running locally -- I cannot find
where the application_controller is instructed to actually shutdown the
application.

Can anyone point me in the right direction?

Thanks,

Jeroen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110330/488f405d/attachment.htm>