Mnesia and partitioning big tables: Doc, it hurts when...
Scott Lystig Fritchie
fritchie@REDACTED
Tue Aug 29 18:59:25 CEST 2006
Howdy. I don't know if any of you have tried partitioning a big
Mnesia table. Oh, say, one with 800K entries in it. "Doctor, it
hurts when I do [this]." The runtime appears to be O(N^2).(*) I
interrupted the process when I noticed:
* The output of mnesia:info/0 would arrive on my remote shell at a
rate of approx. 1 *line* every 15-20 seconds.
* I noticed that it hadn't finished when I returned from supper.
This seems to be a symptom of a larger problem. It seems like any
transaction that involves a large number of write/delete operations
also behaves in O(N^2) manner. Looking at mnesia_frag.erl, it uses
the same mechanism as other write & delete operations.
The culprit appears to be the temporary ETS table
'mnesia_trans_store', used to store current transaction state. It's a
'bag' table, and all (?) write & delete operations use the same key,
'op', for storage in that ETS table. Sequential insertion of items
with the same key into a 'bag' table looks like the cause of the
pain.(***)
Is there any reason why 'mnesia_trans_store' is a 'bag'? Aside from
being a convenient (but slow) way to store the write & delete & other
ops?
-Scott
(*) Erlang R10B-9 on an Linux/Opteron platform, HiPE not used for my
app or for Mnesia(**). The table has not yet been fragmented.
mnesia:change_table_frag(Tab, {activate, []}) is fast.
mnesia:change_table_frag(Tab, {add_frag, [node()]}) is not.
(**) Is anyone running a HiPE-compiled Mnesia application? I haven't
tried, yet.
(***) I have an excerpt of 'fprof' output from adding a fragment to an
800K entry table that already had 25 fragments; each fragment had
approx. 25K entries. Distribution isn't even, so a few fragments have
about 50K entries.)
%% Analysis results:
{ analysis_options,
[{callers, true},
{sort, acc},
{totals, true},
{details, true}]}.
% CNT ACC OWN
[{ totals, 3686037,367339.496,343509.022}].
%%%
{[{{mnesia_schema,schema_transaction,1}, 1,349798.387, 0.000},
{{shell,eval_loop,3}, 2,17538.776, 0.000},
{{disk_log,monitor_request,2}, 2, 2406.145, 0.000},
{{mnesia_schema,do_insert_schema_ops,2}, 326, 1139.758, 0.000},
{{dets,req,2}, 2, 849.061, 0.000},
{{mnesia_dumper,insert,8}, 16542, 212.873, 0.000},
{{mnesia_index,del_ixes,4}, 1110, 105.762, 0.000},
{{mnesia_index,add_index2,6}, 913, 96.456, 0.000},
{{mnesia_lib,db_get,3}, 609, 82.711, 0.000},
{{mnesia_tm,commit_write,6}, 195, 81.309, 0.000},
{{ets,lookup_element,3}, 690, 80.703, 0.000},
{{ets,lookup,2}, 354, 75.630, 0.000},
{{ets,insert,2}, 359, 74.180, 0.000},
{{mnesia_tm,prepare_snmp,3}, 522, 72.567, 0.000},
{{mnesia_lib,db_get,2}, 815, 64.079, 0.000},
{{mnesia_schema,prepare_op,3}, 439, 60.331, 0.000},
{{mnesia_schema,prepare_ops,6}, 434, 59.457, 0.000},
{{mnesia_lib,val,1}, 539, 57.751, 0.000},
{{mnesia_index,delete_index2,3}, 643, 51.354, 0.000},
{{mnesia_tm,do_update_op,3}, 361, 50.525, 0.000},
{{mnesia_frag_hash,key_to_frag_number,2}, 484, 49.866, 0.000},
{{mnesia_index,db_put,2}, 255, 44.327, 0.000},
{{ets,match_delete,2}, 273, 44.318, 0.000},
{{disk_log_server,get_log_pids,1}, 212, 44.299, 0.000},
{{mnesia_frag,do_split,5}, 575, 38.952, 0.000},
{{disk_log,notify,2}, 274, 37.963, 0.000},
{{mnesia_tm,val,1}, 339, 37.882, 0.000},
{{mnesia_dumper,disc_insert,8}, 291, 32.388, 0.000},
{{mnesia_frag,key_to_n,2}, 286, 27.702, 0.000},
{{erlang,phash,2}, 221, 27.694, 0.000},
{{mnesia_tm,do_snmp,2}, 74, 25.488, 0.000},
{{erlang,'++',2}, 98, 24.193, 0.000},
{{ets,select_delete,2}, 136, 22.095, 0.000},
{{mnesia_index,add_index,5}, 84, 21.879, 0.000},
{{mnesia_frag,key_pos,1}, 101, 20.113, 0.000},
{{mnesia_dumper,insert_ops,6}, 251, 19.165, 0.000},
{{mnesia_lib,db_put,3}, 97, 19.094, 0.000},
{{mnesia_tm,commit_delete,6}, 220, 19.020, 0.000},
{{math,pow,2}, 220, 16.810, 0.000},
{{mnesia_index,db_match_erase,2}, 381, 12.915, 0.000},
{{disk_log,alog,2}, 241, 12.784, 0.000},
{{erlang,term_to_binary,1}, 101, 12.571, 0.000},
{{mnesia_schema,val,1}, 184, 11.679, 0.000},
{{gen,wait_resp_mon,3}, 17, 9.381, 0.000},
{{mnesia_dumper,insert_op,5}, 156, 6.492, 0.000},
{{mnesia_dumper,open_files,4}, 124, 6.461, 0.000},
{{mnesia_log,append,2}, 156, 6.404, 0.000},
{{ets,delete,2}, 42, 6.379, 0.000},
{{mnesia_lib,db_erase,3}, 68, 6.350, 0.000},
{{mnesia_index,delete_index,3}, 50, 6.304, 0.000},
{{ets,match_object,2}, 49, 0.049, 0.000},
{{mnesia_locker,receive_wlocks,4}, 4, 0.005, 0.000},
{{prim_file,drv_get_response,1}, 4, 0.004, 0.000},
{{mnesia_tm,rec,2}, 1, 0.004, 0.000},
{{mnesia_schema,schema_coordinator,3}, 1, 0.004, 0.000},
{{filename,join1,4}, 3, 0.003, 0.000},
{{shell,used_records,3}, 1, 0.001, 0.000},
{{shell,prep_check,1}, 1, 0.001, 0.000},
{{mnesia_schema,rec2list,3}, 1, 0.001, 0.000},
{{mnesia_schema,list2cs,1}, 1, 0.001, 0.000},
{{mnesia_schema,do_set_schema,2}, 1, 0.001, 0.000},
{{mnesia_schema,'-change_table_frag/2-fun-0-',2}, 1, 0.001, 0.000},
{{lists,reverse,1}, 1, 0.001, 0.000},
{{lists,foldl,3}, 1, 0.001, 0.000},
{{ets,match,2}, 1, 0.001, 0.000},
{{erl_eval,'-merge_bindings/2-fun-0-',2}, 1, 0.001, 0.000},
{{dict,get_slot,2}, 1, 0.001, 0.000},
{{fprof,just_call,2}, 1, 0.000, 0.000}],
{ suspend, 30943,373628.863, 0.000},
%
[ ]}.
....
{[{{mnesia_schema,do_insert_schema_ops,2}, 50004,324510.758,324480.804},
{{mnesia_index,db_put,2}, 75003, 264.027, 226.191},
{{mnesia_lib,db_put,3}, 25001, 131.409, 125.020},
{{mnesia_lib,set,2}, 109, 9.958, 9.957},
{{mnesia_locker,get_wlocks_on_nodes,5}, 4, 0.004, 0.004},
{{mnesia_schema,insert_cstruct,3}, 3, 0.003, 0.003},
{{mnesia_locker,wlock,3}, 3, 0.003, 0.003},
{{mnesia_tm,multi_commit,4}, 1, 0.001, 0.001},
{{mnesia_recover,note_outcome,1}, 1, 0.001, 0.001},
{{mnesia_recover,note_decision,2}, 1, 0.001, 0.001}],
{ {ets,insert,2}, 150130,324916.165,324841.985},
%
[{suspend, 359, 74.180, 0.000}]}.
More information about the erlang-questions
mailing list