[erlang-questions] Mnesia deadlock with large volume of dirty operations?

Dan Gudmundsson dgud@REDACTED
Fri Apr 2 22:05:21 CEST 2010


clear_table is the fastest way you can delete it, but it will take a
while when there is a lot of data.

/Dan

On Fri, Apr 2, 2010 at 8:22 PM, Brian Acton <acton@REDACTED> wrote:
> I'm sorry. I neglected to tell you what I had done on the previous day.
>
> On the previous day, I had attempted to delete some old records using this
> methodology:
>
>                mnesia:write_lock_table(offline_msg),
>                mnesia:foldl(
>                  fun(Rec, _Acc) ->
>                          case Rec#offline_msg.expire of
>                              never ->
>                                  ok;
>                              TS ->
>                                  if
>                                      TS < TimeStamp ->
>                                          mnesia:delete_object(Rec);
>                                      true ->
>                                          ok
>                                  end
>                          end
>                  end, ok, offline_msg)
>
>
> This delete finished on the 1st node but subsequently locked up all the
> other nodes on a table lock. The cluster blew up and my 24/7 service went
> into 1 hr of recovery of downtime.
>
> So to recap,
>
> on day 1 - transaction start, table lock, delete objects - finished in about
> 2 minutes
> on day 2 - dirty select, dirty delete objects - finished in about 2 minutes
>
> In both cases, the cluster blew up and became unusable for at least 20-30
> minutes. After 20-30 minutes, we initiated recovery protocols.
>
> Should I try
>
> day 3 - transaction start, no table lock, delete objects
>
> ? is the table lock too coarse grained ? considering that the cluster has
> blown up twice, i'm obviously a little scared to try another variation....
>
> --b
>
>
> On Fri, Apr 2, 2010 at 5:47 AM, Ovidiu Deac <ovidiudeac@REDACTED> wrote:
>
>> To me it sounds like another example of premature optimization which
>> went wrong? :)
>>
>> On Fri, Apr 2, 2010 at 10:19 AM, Dan Gudmundsson <dgud@REDACTED> wrote:
>> > When you are using dirty, every operation is sent separately to all
>> nodes,
>> > i.e. 192593*6 messages, actually a transaction could have been faster
>> > in this case.
>> > With one message (large) containing all ops to each node.
>> >
>> > What you get is an overloaded mnesia_tm (very long msg queues),
>> > which do the actual writing of the data on the other (participating
>> > mnesia nodes).
>> >
>> > So transactions will be blocked waiting on mnesia_tm to process those
>> 200000
>> > messages on the other nodes.
>> >
>> > /Dan
>> >
>> > On Fri, Apr 2, 2010 at 1:11 AM, Brian Acton <acton@REDACTED> wrote:
>> >> Hi guys,
>> >>
>> >> I am running R13B04 SMP on FreeBSD 7.3. I have a cluster of 7 nodes
>> running
>> >> mnesia.
>> >>
>> >> I have a table of 1196143 records using about 1.504GB of storage. It's a
>> >> reasonably hot table doing a fair number of insert operations at any
>> given
>> >> time.
>> >>
>> >> I decided that since there was a 2GB limit in mnesia that I should do
>> some
>> >> cleanup on the system and specifically this table.
>> >>
>> >> Trying to avoid major problems with Mnesia, transaction load, and
>> deadlock,
>> >> I decided to do dirty_select and dirty_delete_object individually on the
>> >> records.
>> >>
>> >> I started slow, deleting first 10, then 100, then 1000, then 10000, then
>> >> 100,000 records. My goal was to delete 192593 records total.
>> >>
>> >> The first five deletions went through nicely and caused minimal to no
>> >> impact.
>> >>
>> >> Unfortunately, the very last delete blew up the system. My delete
>> command
>> >> completed successfully but on the other nodes, it caused mnesia to get
>> stuck
>> >> on pending transactions, caused my message queues to fill up and
>> basically
>> >> brought down the whole system. We saw some mnesia is overloaded messages
>> in
>> >> our logs on these nodes but did not see a ton of them.
>> >>
>> >> Does anyone have any clues on what went wrong? I am attaching my code
>> below
>> >> for your review.
>> >>
>> >> --b
>> >>
>> >> Mnesia configuration tunables:
>> >>
>> >>      -mnesia no_table_loaders 20
>> >>      -mnesia dc_dump_limit 40
>> >>      -mnesia dump_log_write_threshold 10000
>> >>
>> >> Example error message:
>> >>
>> >> ** WARNING ** Mnesia is overloaded: {mnesia_tm, message_queue_len,
>> >> [387,842]}
>> >>
>> >> Sample code:
>> >>
>> >> Select = fun(Days) ->
>> >>         {MegaSecs, Secs, _MicroSecs} = now(),
>> >>         T = MegaSecs * 1000000 + Secs - 86400 * Days,
>> >>         TimeStamp = {T div 1000000, T rem 1000000, 0},
>> >>         mnesia:dirty_select(offline_msg,
>> >>                     [{'$1',
>> >>                       [{'<', {element, 3, '$1'},
>> >>                     {TimeStamp} }],
>> >>                       ['$1']}])
>> >>     end.
>> >>
>> >> Count = fun(Days) -> length(Select(Days)) end.
>> >>
>> >> Delete = fun(Days, Total) ->
>> >>         C = Select(Days),
>> >>         D = lists:sublist(C, Total),
>> >>         lists:foreach(fun(Rec) ->
>> >>                       ok = mnesia:dirty_delete_object(Rec)
>> >>                   end,
>> >>                   D),
>> >>         length(D)
>> >>     end.
>> >>
>> >
>> > ________________________________________________________________
>> > erlang-questions (at) erlang.org mailing list.
>> > See http://www.erlang.org/faq.html
>> > To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>> >
>> >
>>
>


More information about the erlang-questions mailing list