[erlang-questions] why might mnesia:start() hang?

Rick Pettit rpettit@REDACTED
Wed Oct 17 06:05:36 CEST 2007


I seem to have encountered a situation in which I am unable to start mnesia.
Attempts to start mnesia (via mnesia:start/0) hang the erlang shell.

In the scenario below there are 2 physical servers, each running an instance
of the foo_rel and bar_rel. The second physical server, someother.somedomain,
has been halted prior to starting the nodes somebox.somedomain.

The foo_rel instances contain disc_copy tables--bar_rel instances contain
ram_copies only.

(foo_rel@REDACTED)1> application:which_applications().
[{sasl,"SASL  CXC 138 11","2.1.5.1"},
 {stdlib,"ERTS  CXC 138 10","1.14.5"},
 {kernel,"ERTS  CXC 138 10","2.11.5"}]


   NOTE: there are other applications in this release which *should* be running
         but are not, almost certainly due to the fact that mnesia is refusing
         to start


(foo_rel@REDACTED)2> mnesia:info().
===> System info in version "4.3.5", debug level = none <===
opt_disc. Directory "/u1/otp/db/foo_rel" is used.
use fallback at restart = false
running db nodes   = ['foo_rel@REDACTED']
stopped db nodes   = ['foo_rel@REDACTED','bar_rel@REDACTED','bar_rel@REDACTED'] 
ok


(foo_rel@REDACTED)3> mnesia:stop().
stopped


(foo_rel@REDACTED)4> mnesia:start().
...shell hangs forever...


Shell back into the node, try again:


(foo_rel@REDACTED)1> mnesia:info().
===> System info in version "4.3.5", debug level = none <===
opt_disc. Directory "/u1/otp/db/foo_rel" is used.
use fallback at restart = false
running db nodes   = ['foo_rel@REDACTED']
stopped db nodes   = ['foo_rel@REDACTED','bar_rel@REDACTED','bar_rel@REDACTED'] 
ok


(foo_rel@REDACTED)2> application:which_applications().
[{sasl,"SASL  CXC 138 11","2.1.5.1"},
 {stdlib,"ERTS  CXC 138 10","1.14.5"},
 {kernel,"ERTS  CXC 138 10","2.11.5"}]


(foo_rel@REDACTED)3> mnesia:start().
...hangs forever...


======

I cannot seem to figure out:

  1) why mnesia refuses to start

  2) why mnesia:start() hangs forever at the shell (vs. return an error, etc)

Any applications requiring mnesia tables do a mnesia:wait_for_tables/2 on them.

A special process performs a mnesia:force_load_table/1, if necessary (e.g. when
wait_for_tables/2 times out).

Unfortunately, this code doesn't get a chance to run if mnesia itself refuses to
start (in many previous test runs the releases started--in some cases the
default table load algorithm worked just fine, and in other failure scenarios
the force_table_load was necessary--but the system always manged to start until
now).

Surely I must just be short on coffee (or sleep) or both. Any help would be
greatly appreciated.

-Rick



More information about the erlang-questions mailing list