[erlang-questions] Mnesia and schema locks

Tue Feb 20 14:53:21 CET 2018

Thanks, that helped a lot.

What we ended up doing was call mnesia:set_debug_level(debug) and 
subscribe to system events and schema table events using 
mnesia:subscribe/1 and this gave us both the transaction/lock that keeps 
getting restarted and the transaction/lock that is the cause for this 
restart. We then inspected things in Observer and could get a very clear 
view of what is going on.

By the way is there a search function for finding a process in Observer? 
That would be useful to find the ones we are looking. :-)

Cheers,

On 02/14/2018 07:32 PM, Dan Gudmundsson wrote:
> Well you will need to figure out what ,<6502.2299.18> <6502.2302.18> are 
> doing, but they probably waiting
> for other locks which are occupied by the busy processes you wrote about.
> But you will have to look at that, debugging mnesia is just following 
> the breadcrumbs around the system.
> 
> mnesia_locker:get_held_locks() and mnesia_locker:get_lock_queue() may 
> also help.
> 
> Using observer to attach to the different nodes is probably easiest, 
> then you can get a stacktrace of each process,
> normally when I do it I don't have a live system. If I want to debug 
> post mortem I use mnesia_lib:dist_coredump()
> to collect each mnesia nodes state and analyse them. Though with many 
> nodes it will take some time to debug or
> figure out why it appears to be hanging.
> 
> 
> On Wed, Feb 14, 2018 at 6:39 PM Loïc Hoguin <essen@REDACTED 
> <mailto:essen@REDACTED>> wrote:
> 
>     Hello,
> 
>     We are trying to debug an issue where we observe a lot of contention
>     when a RabbitMQ node go down. It has a number of symptoms and we are in
>     the middle of figuring things out.
> 
>     One particular symptom occurs on the node that restarts, it gets stuck
>     and there are two Mnesia locks:
> 
>     [{{schema,rabbit_durable_route},read,{tid,879886,<6502.2299.18>}},
>        {{schema,rabbit_exchange},read,{tid,879887,<6502.2302.18>}}]
> 
>     The locks are only cleared when the other node in the cluster stops
>     being so busy deleting data from a number of tables (another symptom)
>     and things go back to normal.
> 
>     Part of the problem is that while this is going on, the restarting node
>     cannot be used, so I would like to understand what conditions can result
>     in these locks staying up for so long. Any tips appreciated!
> 
>     Thanks in advance,
> 
>     --
>     Loïc Hoguin
>     https://ninenines.eu
>     _______________________________________________
>     erlang-questions mailing list
>     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>     http://erlang.org/mailman/listinfo/erlang-questions
> 

-- 
Loïc Hoguin
https://ninenines.eu