Mnesia - mnesia_subscr and force_load_table
Sat Aug 27 11:20:31 CEST 2005
> - Why mnesia_subscr process is not restarted when it's killed and
> mnesia goes down after that?
> try: exit(erlang:whereis(mnesia_subscr), kill).
I will let someone else comment on the specific case of
mnesia_subscr, but in general, you will find that some
processes cannot be killed without bringing down the
application and/or the whole node. This is to be seen
as a reasonable tradeoff, as it can be very difficult to
figure out how to recover gently from some errors.
> force_load_table(Tab) -> yes | ErrorDescription
> The Mnesia algorithm for table load might lead to a situation where a
> table cannot be loaded. This situation occurs when a node is started and
> Mnesia concludes, or suspects, that another copy of the table was active
> after this local copy became inactive due to a system crash.
> - How do I detect the situations when force_load_table function has to be
It's not trivial.
In AXD 301, I wrote a set of programs to monitor mnesia's
- One part that started before mnesia (this can be done
by sorting the list of applications in the .rel file
-- as long as the order doesn't violate application
dependencies, it will be kept.) This application would
check whether the restart was due to partitioned network
and make sure that master_nodes were set accordingly.
- One part that started right after mnesia, and called
mnesia:wait_for_tables(AllMyTabs, Timeout). After Timeout,
a loop analysis was performed in a wait-for graph. This
graph was built using a hello protocol between the waiters
on all nodes. If no cyclical wait was detected, another
call to wait_for_tables/2 was made, and so on. If, at
the point of timeout, there are no other waiters, the
tables are loaded by force.
I once tried to get a research project started to try to
assess the correctness of the algorithm and the code, but
this fell through. One of the questions I also wanted answered
was "what additional information is needed from mnesia in
order to make this easier?", because it does feel as if
mnesia doesn't help as much as it could.
I can't give more details about the solution, since I don't
have it available, and it's been years since I last looked
at it. Given its operational track record, at least the code
isn't obviously broken, though. (:
More information about the erlang-questions