Mnesia - mnesia_subscr and force_load_table

ulf@REDACTED ulf@REDACTED
Sat Aug 27 11:20:31 CEST 2005


> - Why mnesia_subscr process is not restarted when it's killed and
> mnesia goes down after that?
> try: exit(erlang:whereis(mnesia_subscr), kill).

I will let someone else comment on the specific case of
mnesia_subscr, but in general, you will find that some
processes cannot be killed without bringing down the
application and/or the whole node. This is to be seen
as a reasonable tradeoff, as it can be very difficult to
figure out how to recover gently from some errors.


> force_load_table(Tab) -> yes | ErrorDescription
>
>
> The Mnesia algorithm for table load might lead to a situation where a
> table cannot be loaded. This situation occurs when a node is started and
> Mnesia concludes, or suspects, that another copy of the table was active
> after this local copy became inactive due to a system crash.
>
> - How do I detect the situations when force_load_table function has to be
> executed?

It's not trivial.

In AXD 301, I wrote a set of programs to monitor mnesia's
boot process:

- One part that started before mnesia (this can be done
  by sorting the list of applications in the .rel file
  -- as long as the order doesn't violate application
  dependencies, it will be kept.) This application would
  check whether the restart was due to partitioned network
  and make sure that master_nodes were set accordingly.

- One part that started right after mnesia, and called
  mnesia:wait_for_tables(AllMyTabs, Timeout). After Timeout,
  a loop analysis was performed in a wait-for graph. This
  graph was built using a hello protocol between the waiters
  on all nodes. If no cyclical wait was detected, another
  call to wait_for_tables/2 was made, and so on. If, at
  the point of timeout, there are no other waiters, the
  tables are loaded by force.

I once tried to get a research project started to try to
assess the correctness of the algorithm and the code, but
this fell through. One of the questions I also wanted answered
was "what additional information is needed from mnesia in
order to make this easier?", because it does feel as if
mnesia doesn't help as much as it could.

I can't give more details about the solution, since I don't
have it available, and it's been years since I last looked
at it. Given its operational track record, at least the code
isn't obviously broken, though.  (:

/Uffe




More information about the erlang-questions mailing list