[erlang-questions] Mnesia deadlock with large volume of dirty operations?

Fri Apr 2 23:14:11 CEST 2010

You might want to measure the message_queue_len of the mnesia_tm
processes (on each node) to see if it's getting behind, and tune your
waits based upon if the message queues are small/empty or not.

On Fri, Apr 2, 2010 at 2:08 PM, Brian Acton <acton@REDACTED> wrote:
> Yes. I am using dets (disc_only) tables in mnesia.
>
> Since I was able to delete 10k records previously. I think I am going to
> start with a baseline of 10k record with a 60 second sleep interval.
> Hopefully this will work successfully. I wish I knew what a more appropriate
> sleep period would be as the maintenance is now going to take a very long
> time.
>
> Thanks for your help,
>
> --b
>
> On Fri, Apr 2, 2010 at 1:55 PM, Dan Gudmundsson <dgud@REDACTED> wrote:
>
>> Well, I can't much advice, but I would definitely test this on a non
>> live system first.
>>
>> mnesia:fold is not the best tool when you are changing a lot of
>> records, it will have to keep
>> every change in memory until you have traversed the whole table. And
>> it is slow with lot
>> of the changes, since have to compensate for the things you have done
>> earlier in the transaction.
>>
>> I assume your are using dets (disc_only) since you are afraid of the
>> 2G limit, or is it
>> memory limit on windows?
>>
>> dets is slow, mnesia is primarly a ram database.
>>
>> The only way I see it is to chunk though the tables a couple 100~1000
>> records
>> per transaction or something.
>> And have code that can deal with both the new and old format during the
>> changing
>> of the database.
>>
>> Good luck
>> /Dan
>>
>> On Fri, Apr 2, 2010 at 10:19 PM, Brian Acton <acton@REDACTED> wrote:
>> > On this particular table, I do not want to delete all entries. This is
>> why I
>> > posted a separate post to the mailing list. Combining the two threads
>> back,
>> > I want:
>> >
>> > One table, I want to delete entries > n days.
>> > Another table, I want to delete all entries.
>> >
>> > Both tables are reasonably hot (~1-2 ops per second) and reasonably large
>> (>
>> > 1.5GB). I'm hitting the 2GB limit and I need to clean up these tables.
>> >
>> > So far, any attempts at maintenance (as outlined in previous emails) have
>> > resulted in Mnesia seizing up and bringing down the cluster.
>> >
>> > It sounds like I have to do this in very small increments with wait time
>> > between increments. However, I do not have a method and mechanism for
>> > determining the size of an increment or a wait time between increments.
>> I'm
>> > fine doing ten deletes per 1 second if that's what it takes. However, I'd
>> > like to be able to figure out the maximum number of deletes that I can do
>> in
>> > the minimum amount of time.
>> >
>> > I'm definitely open to suggestion on this.
>> >
>> > --b
>> >
>> > On Fri, Apr 2, 2010 at 1:05 PM, Dan Gudmundsson <dgud@REDACTED> wrote:
>> >
>> >> clear_table is the fastest way you can delete it, but it will take a
>> >> while when there is a lot of data.
>> >>
>> >> /Dan
>> >>
>> >> On Fri, Apr 2, 2010 at 8:22 PM, Brian Acton <acton@REDACTED> wrote:
>> >> > I'm sorry. I neglected to tell you what I had done on the previous
>> day.
>> >> >
>> >> > On the previous day, I had attempted to delete some old records using
>> >> this
>> >> > methodology:
>> >> >
>> >> >                mnesia:write_lock_table(offline_msg),
>> >> >                mnesia:foldl(
>> >> >                  fun(Rec, _Acc) ->
>> >> >                          case Rec#offline_msg.expire of
>> >> >                              never ->
>> >> >                                  ok;
>> >> >                              TS ->
>> >> >                                  if
>> >> >                                      TS < TimeStamp ->
>> >> >                                          mnesia:delete_object(Rec);
>> >> >                                      true ->
>> >> >                                          ok
>> >> >                                  end
>> >> >                          end
>> >> >                  end, ok, offline_msg)
>> >> >
>> >> >
>> >> > This delete finished on the 1st node but subsequently locked up all
>> the
>> >> > other nodes on a table lock. The cluster blew up and my 24/7 service
>> went
>> >> > into 1 hr of recovery of downtime.
>> >> >
>> >> > So to recap,
>> >> >
>> >> > on day 1 - transaction start, table lock, delete objects - finished in
>> >> about
>> >> > 2 minutes
>> >> > on day 2 - dirty select, dirty delete objects - finished in about 2
>> >> minutes
>> >> >
>> >> > In both cases, the cluster blew up and became unusable for at least
>> 20-30
>> >> > minutes. After 20-30 minutes, we initiated recovery protocols.
>> >> >
>> >> > Should I try
>> >> >
>> >> > day 3 - transaction start, no table lock, delete objects
>> >> >
>> >> > ? is the table lock too coarse grained ? considering that the cluster
>> has
>> >> > blown up twice, i'm obviously a little scared to try another
>> >> variation....
>> >> >
>> >> > --b
>> >> >
>> >> >
>> >> > On Fri, Apr 2, 2010 at 5:47 AM, Ovidiu Deac <ovidiudeac@REDACTED>
>> >> wrote:
>> >> >
>> >> >> To me it sounds like another example of premature optimization which
>> >> >> went wrong? :)
>> >> >>
>> >> >> On Fri, Apr 2, 2010 at 10:19 AM, Dan Gudmundsson <dgud@REDACTED>
>> >> wrote:
>> >> >> > When you are using dirty, every operation is sent separately to all
>> >> >> nodes,
>> >> >> > i.e. 192593*6 messages, actually a transaction could have been
>> faster
>> >> >> > in this case.
>> >> >> > With one message (large) containing all ops to each node.
>> >> >> >
>> >> >> > What you get is an overloaded mnesia_tm (very long msg queues),
>> >> >> > which do the actual writing of the data on the other (participating
>> >> >> > mnesia nodes).
>> >> >> >
>> >> >> > So transactions will be blocked waiting on mnesia_tm to process
>> those
>> >> >> 200000
>> >> >> > messages on the other nodes.
>> >> >> >
>> >> >> > /Dan
>> >> >> >
>> >> >> > On Fri, Apr 2, 2010 at 1:11 AM, Brian Acton <acton@REDACTED>
>> >> wrote:
>> >> >> >> Hi guys,
>> >> >> >>
>> >> >> >> I am running R13B04 SMP on FreeBSD 7.3. I have a cluster of 7
>> nodes
>> >> >> running
>> >> >> >> mnesia.
>> >> >> >>
>> >> >> >> I have a table of 1196143 records using about 1.504GB of storage.
>> >> It's a
>> >> >> >> reasonably hot table doing a fair number of insert operations at
>> any
>> >> >> given
>> >> >> >> time.
>> >> >> >>
>> >> >> >> I decided that since there was a 2GB limit in mnesia that I should
>> do
>> >> >> some
>> >> >> >> cleanup on the system and specifically this table.
>> >> >> >>
>> >> >> >> Trying to avoid major problems with Mnesia, transaction load, and
>> >> >> deadlock,
>> >> >> >> I decided to do dirty_select and dirty_delete_object individually
>> on
>> >> the
>> >> >> >> records.
>> >> >> >>
>> >> >> >> I started slow, deleting first 10, then 100, then 1000, then
>> 10000,
>> >> then
>> >> >> >> 100,000 records. My goal was to delete 192593 records total.
>> >> >> >>
>> >> >> >> The first five deletions went through nicely and caused minimal to
>> no
>> >> >> >> impact.
>> >> >> >>
>> >> >> >> Unfortunately, the very last delete blew up the system. My delete
>> >> >> command
>> >> >> >> completed successfully but on the other nodes, it caused mnesia to
>> >> get
>> >> >> stuck
>> >> >> >> on pending transactions, caused my message queues to fill up and
>> >> >> basically
>> >> >> >> brought down the whole system. We saw some mnesia is overloaded
>> >> messages
>> >> >> in
>> >> >> >> our logs on these nodes but did not see a ton of them.
>> >> >> >>
>> >> >> >> Does anyone have any clues on what went wrong? I am attaching my
>> code
>> >> >> below
>> >> >> >> for your review.
>> >> >> >>
>> >> >> >> --b
>> >> >> >>
>> >> >> >> Mnesia configuration tunables:
>> >> >> >>
>> >> >> >>      -mnesia no_table_loaders 20
>> >> >> >>      -mnesia dc_dump_limit 40
>> >> >> >>      -mnesia dump_log_write_threshold 10000
>> >> >> >>
>> >> >> >> Example error message:
>> >> >> >>
>> >> >> >> ** WARNING ** Mnesia is overloaded: {mnesia_tm, message_queue_len,
>> >> >> >> [387,842]}
>> >> >> >>
>> >> >> >> Sample code:
>> >> >> >>
>> >> >> >> Select = fun(Days) ->
>> >> >> >>         {MegaSecs, Secs, _MicroSecs} = now(),
>> >> >> >>         T = MegaSecs * 1000000 + Secs - 86400 * Days,
>> >> >> >>         TimeStamp = {T div 1000000, T rem 1000000, 0},
>> >> >> >>         mnesia:dirty_select(offline_msg,
>> >> >> >>                     [{'$1',
>> >> >> >>                       [{'<', {element, 3, '$1'},
>> >> >> >>                     {TimeStamp} }],
>> >> >> >>                       ['$1']}])
>> >> >> >>     end.
>> >> >> >>
>> >> >> >> Count = fun(Days) -> length(Select(Days)) end.
>> >> >> >>
>> >> >> >> Delete = fun(Days, Total) ->
>> >> >> >>         C = Select(Days),
>> >> >> >>         D = lists:sublist(C, Total),
>> >> >> >>         lists:foreach(fun(Rec) ->
>> >> >> >>                       ok = mnesia:dirty_delete_object(Rec)
>> >> >> >>                   end,
>> >> >> >>                   D),
>> >> >> >>         length(D)
>> >> >> >>     end.
>> >> >> >>
>> >> >> >
>> >> >> > ________________________________________________________________
>> >> >> > erlang-questions (at) erlang.org mailing list.
>> >> >> > See http://www.erlang.org/faq.html
>> >> >> > To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>> >> >> >
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>