[erlang-questions] Mnesia and schema locks

Tue Feb 20 15:16:00 CET 2018

On Tue, Feb 20, 2018 at 2:53 PM Loïc Hoguin <essen@REDACTED> wrote:

> Thanks, that helped a lot.
>
> What we ended up doing was call mnesia:set_debug_level(debug) and
> subscribe to system events and schema table events using
> mnesia:subscribe/1 and this gave us both the transaction/lock that keeps
> getting restarted and the transaction/lock that is the cause for this
> restart. We then inspected things in Observer and could get a very clear
> view of what is going on.
>
>
Great

> By the way is there a search function for finding a process in Observer?
> That would be useful to find the ones we are looking. :-)
>
>
Not yet, sounds useful, you can sort columns to ease the scrolling,
but no I have not received an PR on that yet :-)

> Cheers,
>
> On 02/14/2018 07:32 PM, Dan Gudmundsson wrote:
> > Well you will need to figure out what ,<6502.2299.18> <6502.2302.18> are
> > doing, but they probably waiting
> > for other locks which are occupied by the busy processes you wrote about.
> > But you will have to look at that, debugging mnesia is just following
> > the breadcrumbs around the system.
> >
> > mnesia_locker:get_held_locks() and mnesia_locker:get_lock_queue() may
> > also help.
> >
> > Using observer to attach to the different nodes is probably easiest,
> > then you can get a stacktrace of each process,
> > normally when I do it I don't have a live system. If I want to debug
> > post mortem I use mnesia_lib:dist_coredump()
> > to collect each mnesia nodes state and analyse them. Though with many
> > nodes it will take some time to debug or
> > figure out why it appears to be hanging.
> >
> >
> > On Wed, Feb 14, 2018 at 6:39 PM Loïc Hoguin <essen@REDACTED
> > <mailto:essen@REDACTED>> wrote:
> >
> >     Hello,
> >
> >     We are trying to debug an issue where we observe a lot of contention
> >     when a RabbitMQ node go down. It has a number of symptoms and we are
> in
> >     the middle of figuring things out.
> >
> >     One particular symptom occurs on the node that restarts, it gets
> stuck
> >     and there are two Mnesia locks:
> >
> >     [{{schema,rabbit_durable_route},read,{tid,879886,<6502.2299.18>}},
> >        {{schema,rabbit_exchange},read,{tid,879887,<6502.2302.18>}}]
> >
> >     The locks are only cleared when the other node in the cluster stops
> >     being so busy deleting data from a number of tables (another symptom)
> >     and things go back to normal.
> >
> >     Part of the problem is that while this is going on, the restarting
> node
> >     cannot be used, so I would like to understand what conditions can
> result
> >     in these locks staying up for so long. Any tips appreciated!
> >
> >     Thanks in advance,
> >
> >     --
> >     Loïc Hoguin
> >     https://ninenines.eu
> >     _______________________________________________
> >     erlang-questions mailing list
> >     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
> >     http://erlang.org/mailman/listinfo/erlang-questions
> >
>
> --
> Loïc Hoguin
> https://ninenines.eu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180220/0729a4af/attachment.htm>