<div dir="ltr">Well you will need to figure out what <span style="color:rgb(33,33,33)">,<6502.2299.</span><span style="color:rgb(33,33,33)">18> </span><font color="#212121"><6502.2302.18> are doing, but they probably waiting</font><div><span style="color:rgb(33,33,33)">for other locks which are occupied by the busy processes you wrote about.</span></div><div><font color="#212121">But you will have to look at that, debugging mnesia is just following the breadcrumbs around the system.</font></div><div><font color="#212121"><br></font></div><div><span style="color:rgb(33,33,33)">mnesia_locker:get_held_locks() and mnesia_locker:get_lock_queue() may also help.</span>  <font color="#212121"><br></font></div><div><font color="#212121"><br></font></div><div><font color="#212121">Using observer to attach to the different nodes is probably easiest, then you can get a stacktrace of each process,</font></div><div><font color="#212121">normally when I do it I don't have a live system. If I want to debug post mortem I use mnesia_lib:dist_coredump() </font></div><div><font color="#212121">to collect each mnesia nodes state and analyse </font><span style="color:rgb(33,33,33)">them. Though with many nodes it will take some time to debug or</span></div><div><span style="color:rgb(33,33,33)">figure out why it appears to be hanging.</span></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Feb 14, 2018 at 6:39 PM Loïc Hoguin <<a href="mailto:essen@ninenines.eu">essen@ninenines.eu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>

<br>

We are trying to debug an issue where we observe a lot of contention<br>

when a RabbitMQ node go down. It has a number of symptoms and we are in<br>

the middle of figuring things out.<br>

<br>

One particular symptom occurs on the node that restarts, it gets stuck<br>

and there are two Mnesia locks:<br>

<br>

[{{schema,rabbit_durable_route},read,{tid,879886,<6502.2299.18>}},<br>

  {{schema,rabbit_exchange},read,{tid,879887,<6502.2302.18>}}]<br>

<br>

The locks are only cleared when the other node in the cluster stops<br>

being so busy deleting data from a number of tables (another symptom)<br>

and things go back to normal.<br>

<br>

Part of the problem is that while this is going on, the restarting node<br>

cannot be used, so I would like to understand what conditions can result<br>

in these locks staying up for so long. Any tips appreciated!<br>

<br>

Thanks in advance,<br>

<br>

--<br>

Loïc Hoguin<br>

<a href="https://ninenines.eu" rel="noreferrer" target="_blank">https://ninenines.eu</a><br>

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

</blockquote></div>