[erlang-questions] why might mnesia:start() hang?
Rick Pettit
rpettit@REDACTED
Wed Oct 17 16:49:01 CEST 2007
On Wed, Oct 17, 2007 at 11:32:14AM +0200, Hakan Mattsson wrote:
>
> There may be several causes for this to happen:
>
> - It may be the case that some other application has
> encountered a deadlock in its startup. This may for
> example occur if that application is invoking functions
> in the 'application' API during its startup. It may also
> occur if a process dies during the application
> startup. Then its supervisor will not restart the
> process until it has started all its children.
I will check but don't believe this is my case.
> - It could also be that it is Mnesia that refuses to
> start. This may happen if the system first crashes
> during the critical phase in transaction commit and one
> of the other nodes does not come up again. Then Mnesia
> will by default wait indefinitely for the other node to
> be available before it finishes its own startup.
This is what I suspect happened, just don't know the details (yet :-).
> See
> the documentation about the Mnesia parameter
> max_wait_for_decision for more info.
I stumbled on this last night but have not yet gotten a chance to
try it--thanks.
> If you set the
> Mnesia debug level to at least 'verbose' (before you
> start Mnesia) you will get a printout when this happens.
I'll try that, too.
Thanks again for all your help,
-Rick
> On Tue, 16 Oct 2007, Rick Pettit wrote:
>
> RP> Date: Tue, 16 Oct 2007 23:05:36 -0500
> RP> From: Rick Pettit <rpettit@REDACTED>
> RP> To: erlang-questions@REDACTED
> RP> Subject: [erlang-questions] why might mnesia:start() hang?
> RP>
> RP> I seem to have encountered a situation in which I am
> RP> unable to start mnesia. Attempts to start mnesia (via
> RP> mnesia:start/0) hang the erlang shell.
> RP>
> RP> In the scenario below there are 2 physical servers,
> RP> each running an instance of the foo_rel and
> RP> bar_rel. The second physical server,
> RP> someother.somedomain, has been halted prior to starting
> RP> the nodes somebox.somedomain.
> RP>
> RP> The foo_rel instances contain disc_copy tables--bar_rel
> RP> instances contain ram_copies only.
> RP>
> RP> (foo_rel@REDACTED)1> application:which_applications().
> RP> [{sasl,"SASL CXC 138 11","2.1.5.1"},
> RP> {stdlib,"ERTS CXC 138 10","1.14.5"},
> RP> {kernel,"ERTS CXC 138 10","2.11.5"}]
> RP>
> RP>
> RP> NOTE: there are other applications in this release which *should* be running
> RP> but are not, almost certainly due to the fact that mnesia is refusing
> RP> to start
> RP>
> RP>
> RP> (foo_rel@REDACTED)2> mnesia:info().
> RP> ===> System info in version "4.3.5", debug level = none <===
> RP> opt_disc. Directory "/u1/otp/db/foo_rel" is used.
> RP> use fallback at restart = false
> RP> running db nodes = ['foo_rel@REDACTED']
> RP> stopped db nodes = ['foo_rel@REDACTED','bar_rel@REDACTED','bar_rel@REDACTED']
> RP> ok
> RP>
> RP>
> RP> (foo_rel@REDACTED)3> mnesia:stop().
> RP> stopped
> RP>
> RP>
> RP> (foo_rel@REDACTED)4> mnesia:start().
> RP> ...shell hangs forever...
> RP>
> RP>
> RP> Shell back into the node, try again:
> RP>
> RP>
> RP> (foo_rel@REDACTED)1> mnesia:info().
> RP> ===> System info in version "4.3.5", debug level = none <===
> RP> opt_disc. Directory "/u1/otp/db/foo_rel" is used.
> RP> use fallback at restart = false
> RP> running db nodes = ['foo_rel@REDACTED']
> RP> stopped db nodes = ['foo_rel@REDACTED','bar_rel@REDACTED','bar_rel@REDACTED']
> RP> ok
> RP>
> RP>
> RP> (foo_rel@REDACTED)2> application:which_applications().
> RP> [{sasl,"SASL CXC 138 11","2.1.5.1"},
> RP> {stdlib,"ERTS CXC 138 10","1.14.5"},
> RP> {kernel,"ERTS CXC 138 10","2.11.5"}]
> RP>
> RP>
> RP> (foo_rel@REDACTED)3> mnesia:start().
> RP> ...hangs forever...
> RP>
> RP>
> RP> ======
> RP>
> RP> I cannot seem to figure out:
> RP>
> RP> 1) why mnesia refuses to start
> RP>
> RP> 2) why mnesia:start() hangs forever at the shell
> RP> (vs. return an error, etc)
> RP>
> RP> Any applications requiring mnesia tables do a
> RP> mnesia:wait_for_tables/2 on them.
> RP>
> RP> A special process performs a mnesia:force_load_table/1,
> RP> if necessary (e.g. when wait_for_tables/2 times out).
> RP>
> RP> Unfortunately, this code doesn't get a chance to run if
> RP> mnesia itself refuses to start (in many previous test
> RP> runs the releases started--in some cases the default
> RP> table load algorithm worked just fine, and in other
> RP> failure scenarios the force_table_load was
> RP> necessary--but the system always manged to start until
> RP> now).
> RP>
> RP> Surely I must just be short on coffee (or sleep) or
> RP> both. Any help would be greatly appreciated.
> RP>
> RP> -Rick
> RP> _______________________________________________
> RP> erlang-questions mailing list
> RP> erlang-questions@REDACTED
> RP> http://www.erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list