[erlang-questions] Combining ets and mnesia operations for extreme performance
Ulf Wiger (TN/EAB)
Fri Sep 12 15:40:51 CEST 2008
Paulo Sérgio Almeida skrev:
> Hi all,
> This one is for mnesia hackers (Ulf, Joel, ...)
> I have a scenario where I am building aggregations over some data,
> most of the time counting things. For the performance I want I cannot
> use mnesia transactions naively, but even dirty_writes would be too
> slow and would not give me some atomicity I need.
> A typical scenario would be to atomically apply e.g. 1 million
> increments spread over some hundreds/thousands of keys in some dozen
> tables. This atomicity is from the durability point of view. i.e.
> either all increments are processed and made persistent, or, e.g. if
> the machine crashes before, no update must have been made persistent.
> I don't need to worry about atomicity from the lookup point of view
> (i.e. there would be no problem doing a lookup and reading some
> intermediate value).
> This leads to my strategy: - using mnesia disc_copies; - operating
> directly on the underlying ets tables; e.g. ets:update_counter -
> marking which keys become "dirty"; - then, in a single
> mnesia:transaction: - ets:lookup the dirty keys - mnesia:write these
Perhaps you'd want to consider using ram_copies and dumping
to disk using mnesia:dump_tables(Tabs)?
That wouldn't be such a blatant violation of mnesia rules. ;-)
I assume from your description that the table isn't replicated?
> Basically, it has been working for me. But as I am doing something I
> shouldn't (updating the ets tables directly), I ask what could go
> wrong. I thought a bit and could only see one potential problem: that
> mnesia dumps the ets table to the DCD file when I already started a
> subsequent aggregation and have already done some ets operations
> Therefore I ask: when exactly does mnesia try to see whether to dump
> a table or not (according to dc_dump_limit)? Can it be after a
> mnesia:transaction finished? How long after?
You could mess with the dump limit, but if you were to use
mnesia for some other tasks in your application, this might
come back and bite you.
Log dump is a background job, and it's scheduled as soon
as the time or write threshold is exceeded. If it's a write
threshold, the number of committed writes is what triggers it.
> Or an alternative point of view: forgetting mnesia, does someone know
> other solutions for persistence of ets tables. I used mnesia as a
> first approach, but I am open to alternatives.
If you have no particular need for some mnesia functions, it
would seem as if just using an ets table and calling ets:tab2file/2
would seem to be sufficient for what you've described.
More information about the erlang-questions