[erlang-questions] Mnesia deadlock with large volume of dirty operations?
Brian Acton
acton@REDACTED
Fri Apr 2 20:22:52 CEST 2010
I'm sorry. I neglected to tell you what I had done on the previous day.
On the previous day, I had attempted to delete some old records using this
methodology:
mnesia:write_lock_table(offline_msg),
mnesia:foldl(
fun(Rec, _Acc) ->
case Rec#offline_msg.expire of
never ->
ok;
TS ->
if
TS < TimeStamp ->
mnesia:delete_object(Rec);
true ->
ok
end
end
end, ok, offline_msg)
This delete finished on the 1st node but subsequently locked up all the
other nodes on a table lock. The cluster blew up and my 24/7 service went
into 1 hr of recovery of downtime.
So to recap,
on day 1 - transaction start, table lock, delete objects - finished in about
2 minutes
on day 2 - dirty select, dirty delete objects - finished in about 2 minutes
In both cases, the cluster blew up and became unusable for at least 20-30
minutes. After 20-30 minutes, we initiated recovery protocols.
Should I try
day 3 - transaction start, no table lock, delete objects
? is the table lock too coarse grained ? considering that the cluster has
blown up twice, i'm obviously a little scared to try another variation....
--b
On Fri, Apr 2, 2010 at 5:47 AM, Ovidiu Deac <ovidiudeac@REDACTED> wrote:
> To me it sounds like another example of premature optimization which
> went wrong? :)
>
> On Fri, Apr 2, 2010 at 10:19 AM, Dan Gudmundsson <dgud@REDACTED> wrote:
> > When you are using dirty, every operation is sent separately to all
> nodes,
> > i.e. 192593*6 messages, actually a transaction could have been faster
> > in this case.
> > With one message (large) containing all ops to each node.
> >
> > What you get is an overloaded mnesia_tm (very long msg queues),
> > which do the actual writing of the data on the other (participating
> > mnesia nodes).
> >
> > So transactions will be blocked waiting on mnesia_tm to process those
> 200000
> > messages on the other nodes.
> >
> > /Dan
> >
> > On Fri, Apr 2, 2010 at 1:11 AM, Brian Acton <acton@REDACTED> wrote:
> >> Hi guys,
> >>
> >> I am running R13B04 SMP on FreeBSD 7.3. I have a cluster of 7 nodes
> running
> >> mnesia.
> >>
> >> I have a table of 1196143 records using about 1.504GB of storage. It's a
> >> reasonably hot table doing a fair number of insert operations at any
> given
> >> time.
> >>
> >> I decided that since there was a 2GB limit in mnesia that I should do
> some
> >> cleanup on the system and specifically this table.
> >>
> >> Trying to avoid major problems with Mnesia, transaction load, and
> deadlock,
> >> I decided to do dirty_select and dirty_delete_object individually on the
> >> records.
> >>
> >> I started slow, deleting first 10, then 100, then 1000, then 10000, then
> >> 100,000 records. My goal was to delete 192593 records total.
> >>
> >> The first five deletions went through nicely and caused minimal to no
> >> impact.
> >>
> >> Unfortunately, the very last delete blew up the system. My delete
> command
> >> completed successfully but on the other nodes, it caused mnesia to get
> stuck
> >> on pending transactions, caused my message queues to fill up and
> basically
> >> brought down the whole system. We saw some mnesia is overloaded messages
> in
> >> our logs on these nodes but did not see a ton of them.
> >>
> >> Does anyone have any clues on what went wrong? I am attaching my code
> below
> >> for your review.
> >>
> >> --b
> >>
> >> Mnesia configuration tunables:
> >>
> >> -mnesia no_table_loaders 20
> >> -mnesia dc_dump_limit 40
> >> -mnesia dump_log_write_threshold 10000
> >>
> >> Example error message:
> >>
> >> ** WARNING ** Mnesia is overloaded: {mnesia_tm, message_queue_len,
> >> [387,842]}
> >>
> >> Sample code:
> >>
> >> Select = fun(Days) ->
> >> {MegaSecs, Secs, _MicroSecs} = now(),
> >> T = MegaSecs * 1000000 + Secs - 86400 * Days,
> >> TimeStamp = {T div 1000000, T rem 1000000, 0},
> >> mnesia:dirty_select(offline_msg,
> >> [{'$1',
> >> [{'<', {element, 3, '$1'},
> >> {TimeStamp} }],
> >> ['$1']}])
> >> end.
> >>
> >> Count = fun(Days) -> length(Select(Days)) end.
> >>
> >> Delete = fun(Days, Total) ->
> >> C = Select(Days),
> >> D = lists:sublist(C, Total),
> >> lists:foreach(fun(Rec) ->
> >> ok = mnesia:dirty_delete_object(Rec)
> >> end,
> >> D),
> >> length(D)
> >> end.
> >>
> >
> > ________________________________________________________________
> > erlang-questions (at) erlang.org mailing list.
> > See http://www.erlang.org/faq.html
> > To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
> >
> >
>
More information about the erlang-questions
mailing list