[erlang-questions] Mnesia and schema locks
Loïc Hoguin
essen@REDACTED
Tue Feb 20 14:53:21 CET 2018
Thanks, that helped a lot.
What we ended up doing was call mnesia:set_debug_level(debug) and
subscribe to system events and schema table events using
mnesia:subscribe/1 and this gave us both the transaction/lock that keeps
getting restarted and the transaction/lock that is the cause for this
restart. We then inspected things in Observer and could get a very clear
view of what is going on.
By the way is there a search function for finding a process in Observer?
That would be useful to find the ones we are looking. :-)
Cheers,
On 02/14/2018 07:32 PM, Dan Gudmundsson wrote:
> Well you will need to figure out what ,<6502.2299.18> <6502.2302.18> are
> doing, but they probably waiting
> for other locks which are occupied by the busy processes you wrote about.
> But you will have to look at that, debugging mnesia is just following
> the breadcrumbs around the system.
>
> mnesia_locker:get_held_locks() and mnesia_locker:get_lock_queue() may
> also help.
>
> Using observer to attach to the different nodes is probably easiest,
> then you can get a stacktrace of each process,
> normally when I do it I don't have a live system. If I want to debug
> post mortem I use mnesia_lib:dist_coredump()
> to collect each mnesia nodes state and analyse them. Though with many
> nodes it will take some time to debug or
> figure out why it appears to be hanging.
>
>
> On Wed, Feb 14, 2018 at 6:39 PM Loïc Hoguin <essen@REDACTED
> <mailto:essen@REDACTED>> wrote:
>
> Hello,
>
> We are trying to debug an issue where we observe a lot of contention
> when a RabbitMQ node go down. It has a number of symptoms and we are in
> the middle of figuring things out.
>
> One particular symptom occurs on the node that restarts, it gets stuck
> and there are two Mnesia locks:
>
> [{{schema,rabbit_durable_route},read,{tid,879886,<6502.2299.18>}},
> {{schema,rabbit_exchange},read,{tid,879887,<6502.2302.18>}}]
>
> The locks are only cleared when the other node in the cluster stops
> being so busy deleting data from a number of tables (another symptom)
> and things go back to normal.
>
> Part of the problem is that while this is going on, the restarting node
> cannot be used, so I would like to understand what conditions can result
> in these locks staying up for so long. Any tips appreciated!
>
> Thanks in advance,
>
> --
> Loïc Hoguin
> https://ninenines.eu
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
> http://erlang.org/mailman/listinfo/erlang-questions
>
--
Loïc Hoguin
https://ninenines.eu
More information about the erlang-questions
mailing list