Performance of mnesia:select/2
Jacob
jacob01@REDACTED
Fri Feb 26 23:48:09 CET 2021
Hi,
that's a good hint, thanks. I indeed have used adjacent keys in my
"benchmarks" and the distance between keys has indeed a big influence on
the times.
In addition I did the lookup measurements in the shell, which was unfair
since the select call was just a function call while the lookup loop was
executed by the shell.
So the numbers using a compiled module are
* using lookups: [set, {keypos, 2}]: 295 us
* using lookup: [ordered_set, {keypos, 2}]: 221 us
* using pattern, [ordered_set, {keypos, 2}]: 1206 us
* using pattern, [set, {keypos, 2}]: 4570 us
Measurements with a varying number of keys and a varying distance
between keys:
nkeys skip pat/ordset pat/set lu/ordset lu/set
100 1 261 390 128 63
100 10 2482 382 154 66
100 100 10348 488 100 76
200 1 222 948 128 146
200 10 3758 618 188 141
200 100 37582 666 163 149
500 1 1206 4570 221 295
500 10 22419 3967 265 232
500 100 224580 4062 445 205
1000 1 5489 15148 251 227
1000 10 89676 15077 654 370
1000 100 889965 15941 763 302
2000 1 19062 57493 526 439
2000 10 341417 57467 1063 457
2000 100 3542462 59253 1850 547
5000 1 114807 359162 1589 1477
5000 10 2136702 362625 2954 1853
5000 100 22301409 357902 4933 2424
nkeys: number of keys per query
skip: distance between subsequent keys ( Keys = lists:seq(1, Skip *
Nkeys, Skip)
pat: select with key in pattern
lu: lookup loop
set: ETS type set
ordset: ETS type ordered_set
Long story short: Follow Dan Gudmundsson suggestion for ETS based
operations, it's much faster, scales better and suffers less from
non-adjacent keys.
On the other hand, the results might be different with distributed
Mnesia and disk-only tables.
/Jacob
On 2/26/21 6:49 PM, Sverker Eriksson wrote:
>
> One thing to be aware of when doing benchmarks on ETS set vs
> ordered_set is that ordered_set may get boosted cache performance if
> the keys are accessed in term order. The same internal tree nodes are
> visited over and over again as term adjacent key are most often close
> to each other in the tree.
>
> So if that is not your typical traffic case, the benchmark should
> access the keys in some random order.
>
>
>
> /Sverker
>
>
>
>
>
> *From:*erlang-questions <erlang-questions-bounces@REDACTED> *On
> Behalf Of *Jacob
> *Sent:* den 26 februari 2021 16:35
> *To:* Dan Gudmundsson <dangud@REDACTED>
> *Cc:* Questions erlang-questions <erlang-questions@REDACTED>
> *Subject:* Re: Performance of mnesia:select/2
>
>
>
>
>
> On 2/26/21 3:53 PM, Dan Gudmundsson wrote:
>
> Interesting, and times do you get for 500 ets lookup on that data?
>
> Oh, I really should have tested that as well:
>
> timer:tc(fun () -> lists:map(fun(K) -> ets:lookup(t, K) end, Keys) end).
>
> * using lookups: [set, {keypos, 2}]: 3148 us
> * using lookup: [ordered_set, {keypos, 2}]: 3423 us
>
> I have repeated the select/pattern runs to have comparable results:
>
> * using pattern, [ordered_set, {keypos, 2}]: 3173 us
> * using pattern, [set, {keypos, 2}]: 8091 us
>
> Note that there was quite a high jitter on the measurements so I have
> compute the mean value of 3 measurements each.
>
> So lookups and pattern/ordered_set were too close to really come to a
> winner here. I had the impression that the variance was slightly
> higher with lookup, but digging into this would require a better
> measurement setup than what I have been using.
>
>
>
>
>
> On Fri, Feb 26, 2021 at 3:12 PM Jacob <jacob01@REDACTED
> <mailto:jacob01@REDACTED>> wrote:
>
> Hi,
>
> assuming that the match spec compiler does clever things with
> patterns,
> I'd use select with the following
>
> MatchExpression = [ {{'_', K, '_'}, [], ['$_']} || K <- Keys ]
>
> If did some quick measurements with plain ETS, timer:tc and
> 500 keys out
> of 1000000 table entries and got:
>
> * using pattern, [ordered_set]: 4532605 us
> * using pattern, [set] : 4645525 us
> * using pattern, [ordered_set, {keypos, 2}]: 3826 us (!!!)
> * using pattern, [set, {keypos, 2}]: 5714 us (!!!)
>
> * using guards, [ordered_set]: 12542928 us
> * using guards, [set]: 12310452 us
> * using guards, [ordered_set, {keypos, 2}]: 12365477 us
> * using guards, [set, {keypos, 2}]: 12277839 us
>
> I have initialised the DB with [ ets:insert(t, {N, N,
> integer_to_list(N)}) || N <- lists:seq(1, 1000000) ].
>
> I don't know though, how this will translate to Mnesia, but
> I'd give
> select with pattern on the primary key a try.
>
> /Jacob
>
>
> On 2/26/21 11:03 AM, Vance Shipley wrote:
> > If I need to lookup a list of keys which is the better
> approach? Why?
> >
> > Fselect = fun(Keys) ->
> > MatchHead = {'_', '$1', '$2'},
> > F = fun(Key) ->
> > {'=:=', '$1', Key}
> > end,
> > MatchConditions = [list_to_tuple(['or' |
> lists:map(F, Keys)]),
> > MatchBody = ['$_'],
> > MatchFunction = {MatchHead, MatchConditions, MatchBody},
> > MatchExpression = [MatchFunction],
> > mnesia:select(Table, MatchExpression)
> > end,
> > mnesia:transaction(Fselect, [Keys]).
> >
> > Fread = fun F([Key | T], Acc) ->
> > [R] = mnesia:read(Table, Key),
> > F(T, [R | Acc]);
> > F([], Acc) ->
> > lists:reverse(Acc)
> > end,
> > mnesia:transaction(Fread, [Keys. []]).
> >
> >
> > --
> > -Vance
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210226/ad2810c7/attachment.htm>
More information about the erlang-questions
mailing list