[erlang-questions] Mnesia deadlock with large volume of dirty operations?

Dan Gudmundsson <>
Fri Apr 2 09:19:10 CEST 2010


When you are using dirty, every operation is sent separately to all nodes,
i.e. 192593*6 messages, actually a transaction could have been faster
in this case.
With one message (large) containing all ops to each node.

What you get is an overloaded mnesia_tm (very long msg queues),
which do the actual writing of the data on the other (participating
mnesia nodes).

So transactions will be blocked waiting on mnesia_tm to process those 200000
messages on the other nodes.

/Dan

On Fri, Apr 2, 2010 at 1:11 AM, Brian Acton <> wrote:
> Hi guys,
>
> I am running R13B04 SMP on FreeBSD 7.3. I have a cluster of 7 nodes running
> mnesia.
>
> I have a table of 1196143 records using about 1.504GB of storage. It's a
> reasonably hot table doing a fair number of insert operations at any given
> time.
>
> I decided that since there was a 2GB limit in mnesia that I should do some
> cleanup on the system and specifically this table.
>
> Trying to avoid major problems with Mnesia, transaction load, and deadlock,
> I decided to do dirty_select and dirty_delete_object individually on the
> records.
>
> I started slow, deleting first 10, then 100, then 1000, then 10000, then
> 100,000 records. My goal was to delete 192593 records total.
>
> The first five deletions went through nicely and caused minimal to no
> impact.
>
> Unfortunately, the very last delete blew up the system. My delete command
> completed successfully but on the other nodes, it caused mnesia to get stuck
> on pending transactions, caused my message queues to fill up and basically
> brought down the whole system. We saw some mnesia is overloaded messages in
> our logs on these nodes but did not see a ton of them.
>
> Does anyone have any clues on what went wrong? I am attaching my code below
> for your review.
>
> --b
>
> Mnesia configuration tunables:
>
>      -mnesia no_table_loaders 20
>      -mnesia dc_dump_limit 40
>      -mnesia dump_log_write_threshold 10000
>
> Example error message:
>
> ** WARNING ** Mnesia is overloaded: {mnesia_tm, message_queue_len,
> [387,842]}
>
> Sample code:
>
> Select = fun(Days) ->
>         {MegaSecs, Secs, _MicroSecs} = now(),
>         T = MegaSecs * 1000000 + Secs - 86400 * Days,
>         TimeStamp = {T div 1000000, T rem 1000000, 0},
>         mnesia:dirty_select(offline_msg,
>                     [{'$1',
>                       [{'<', {element, 3, '$1'},
>                     {TimeStamp} }],
>                       ['$1']}])
>     end.
>
> Count = fun(Days) -> length(Select(Days)) end.
>
> Delete = fun(Days, Total) ->
>         C = Select(Days),
>         D = lists:sublist(C, Total),
>         lists:foreach(fun(Rec) ->
>                       ok = mnesia:dirty_delete_object(Rec)
>                   end,
>                   D),
>         length(D)
>     end.
>


More information about the erlang-questions mailing list