[erlang-questions] why might mnesia:start() hang?

Hakan Mattsson hakan@REDACTED
Wed Oct 17 11:32:14 CEST 2007


There may be several causes for this to happen:

- It may be the case that some other application has
  encountered a deadlock in its startup. This may for
  example occur if that application is invoking functions
  in the 'application' API during its startup. It may also
  occur if a process dies during the application
  startup. Then its supervisor will not restart the
  process until it has started all its children. 

- It could also be that it is Mnesia that refuses to
  start. This may happen if the system first crashes
  during the critical phase in transaction commit and one
  of the other nodes does not come up again. Then Mnesia
  will by default wait indefinitely for the other node to
  be available before it finishes its own startup. See
  the documentation about the Mnesia parameter
  max_wait_for_decision for more info. If you set the
  Mnesia debug level to at least 'verbose' (before you
  start Mnesia) you will get a printout when this happens.

- ...

/Håkan


On Tue, 16 Oct 2007, Rick Pettit wrote:

RP> Date: Tue, 16 Oct 2007 23:05:36 -0500
RP> From: Rick Pettit <rpettit@REDACTED>
RP> To: erlang-questions@REDACTED
RP> Subject: [erlang-questions] why might mnesia:start() hang?
RP> 
RP> I seem to have encountered a situation in which I am
RP> unable to start mnesia.  Attempts to start mnesia (via
RP> mnesia:start/0) hang the erlang shell.
RP> 
RP> In the scenario below there are 2 physical servers,
RP> each running an instance of the foo_rel and
RP> bar_rel. The second physical server,
RP> someother.somedomain, has been halted prior to starting
RP> the nodes somebox.somedomain.
RP> 
RP> The foo_rel instances contain disc_copy tables--bar_rel
RP> instances contain ram_copies only.
RP> 
RP> (foo_rel@REDACTED)1> application:which_applications().
RP> [{sasl,"SASL  CXC 138 11","2.1.5.1"},
RP>  {stdlib,"ERTS  CXC 138 10","1.14.5"},
RP>  {kernel,"ERTS  CXC 138 10","2.11.5"}]
RP> 
RP> 
RP>    NOTE: there are other applications in this release which *should* be running
RP>          but are not, almost certainly due to the fact that mnesia is refusing
RP>          to start
RP> 
RP> 
RP> (foo_rel@REDACTED)2> mnesia:info().
RP> ===> System info in version "4.3.5", debug level = none <===
RP> opt_disc. Directory "/u1/otp/db/foo_rel" is used.
RP> use fallback at restart = false
RP> running db nodes   = ['foo_rel@REDACTED']
RP> stopped db nodes   = ['foo_rel@REDACTED','bar_rel@REDACTED','bar_rel@REDACTED'] 
RP> ok
RP> 
RP> 
RP> (foo_rel@REDACTED)3> mnesia:stop().
RP> stopped
RP> 
RP> 
RP> (foo_rel@REDACTED)4> mnesia:start().
RP> ...shell hangs forever...
RP> 
RP> 
RP> Shell back into the node, try again:
RP> 
RP> 
RP> (foo_rel@REDACTED)1> mnesia:info().
RP> ===> System info in version "4.3.5", debug level = none <===
RP> opt_disc. Directory "/u1/otp/db/foo_rel" is used.
RP> use fallback at restart = false
RP> running db nodes   = ['foo_rel@REDACTED']
RP> stopped db nodes   = ['foo_rel@REDACTED','bar_rel@REDACTED','bar_rel@REDACTED'] 
RP> ok
RP> 
RP> 
RP> (foo_rel@REDACTED)2> application:which_applications().
RP> [{sasl,"SASL  CXC 138 11","2.1.5.1"},
RP>  {stdlib,"ERTS  CXC 138 10","1.14.5"},
RP>  {kernel,"ERTS  CXC 138 10","2.11.5"}]
RP> 
RP> 
RP> (foo_rel@REDACTED)3> mnesia:start().
RP> ...hangs forever...
RP> 
RP> 
RP> ======
RP> 
RP> I cannot seem to figure out:
RP> 
RP>   1) why mnesia refuses to start
RP> 
RP>   2) why mnesia:start() hangs forever at the shell
RP>     (vs. return an error, etc)
RP> 
RP> Any applications requiring mnesia tables do a
RP> mnesia:wait_for_tables/2 on them.
RP> 
RP> A special process performs a mnesia:force_load_table/1,
RP> if necessary (e.g. when wait_for_tables/2 times out).
RP> 
RP> Unfortunately, this code doesn't get a chance to run if
RP> mnesia itself refuses to start (in many previous test
RP> runs the releases started--in some cases the default
RP> table load algorithm worked just fine, and in other
RP> failure scenarios the force_table_load was
RP> necessary--but the system always manged to start until
RP> now).
RP> 
RP> Surely I must just be short on coffee (or sleep) or
RP> both. Any help would be greatly appreciated.
RP> 
RP> -Rick
RP> _______________________________________________
RP> erlang-questions mailing list
RP> erlang-questions@REDACTED
RP> http://www.erlang.org/mailman/listinfo/erlang-questions


More information about the erlang-questions mailing list