[erlang-questions] Combining ets and mnesia operations for extreme performance
Paulo Sérgio Almeida
psa@REDACTED
Thu Sep 11 20:06:02 CEST 2008
Hi all,
This one is for mnesia hackers (Ulf, Joel, ...)
I have a scenario where I am building aggregations over some data, most
of the time counting things. For the performance I want I cannot use
mnesia transactions naively, but even dirty_writes would be too slow and
would not give me some atomicity I need.
A typical scenario would be to atomically apply e.g. 1 million
increments spread over some hundreds/thousands of keys in some dozen
tables. This atomicity is from the durability point of view. i.e. either
all increments are processed and made persistent, or, e.g. if the
machine crashes before, no update must have been made persistent. I
don't need to worry about atomicity from the lookup point of view (i.e.
there would be no problem doing a lookup and reading some intermediate
value).
This leads to my strategy:
- using mnesia disc_copies;
- operating directly on the underlying ets tables; e.g.
ets:update_counter
- marking which keys become "dirty";
- then, in a single mnesia:transaction:
- ets:lookup the dirty keys
- mnesia:write these records
Basically, it has been working for me. But as I am doing something I
shouldn't (updating the ets tables directly), I ask what could go wrong.
I thought a bit and could only see one potential problem: that mnesia
dumps the ets table to the DCD file when I already started a subsequent
aggregation and have already done some ets operations myself.
Therefore I ask: when exactly does mnesia try to see whether to dump a
table or not (according to dc_dump_limit)? Can it be after a
mnesia:transaction finished? How long after?
The "solution" I came up with is:
- prevent mnesia from dumping ets tables to DCD by putting an
extremely low value in dc_dump_limit; e.g. 0.000001; btw, can floats be
used here?
- take control of the dumping, deciding myself wether to do it for
each table and if so, doing it after the mnesia:transaction, and before
starting to mess again with ets tables in the subsequent aggregation;
- I found out I can do this with mnesia_log:ets2dcd. Is this the
right way to do it?
The above uses functions which are internal to mnesia and not part of
the official API, which is not a good thing. I would sugest that it
would be nice if mnesia exported officially functionality such as the
above, so that one can take some control, use only some parts of it, or
combine them in novel ways. In my case, I don't need distribution or
concurrency control (only 1 process writes to such tables; well in fact
I have several processes but no two of them write to the same table)),
and I am using mnesia just for persistence of ets tables with some
atomicity involved.
Or an alternative point of view: forgetting mnesia, does someone know
other solutions for persistence of ets tables. I used mnesia as a first
approach, but I am open to alternatives.
Regards,
Paulo Almeida
More information about the erlang-questions
mailing list