Performance of mnesia:select/2

Fri Feb 26 23:48:09 CET 2021

Hi,

that's a good hint, thanks. I indeed have used adjacent keys in my
"benchmarks" and the distance between keys has indeed a big influence on
the times.

In addition I did the lookup measurements in the shell, which was unfair
since the select call was just a function call while the lookup loop was
executed by the shell.

So the numbers using a compiled module are

* using lookups: [set, {keypos, 2}]:               295 us
* using lookup: [ordered_set, {keypos, 2}]:  221 us

* using pattern, [ordered_set, {keypos, 2}]: 1206 us
* using pattern, [set, {keypos, 2}]:               4570 us

Measurements with a varying number of keys and a varying distance
between keys:

nkeys      skip          pat/ordset     pat/set    lu/ordset    lu/set

 100          1             261          390        128         63
 100         10            2482          382        154         66
 100        100           10348          488        100         76
 200          1             222          948        128        146
 200         10            3758          618        188        141
 200        100           37582          666        163        149
 500          1            1206         4570        221        295
 500         10           22419         3967        265        232
 500        100          224580         4062        445        205
1000          1            5489        15148        251        227
1000         10           89676        15077        654        370
1000        100          889965        15941        763        302
2000          1           19062        57493        526        439
2000         10          341417        57467       1063        457
2000        100         3542462        59253       1850        547
5000          1          114807       359162       1589       1477
5000         10         2136702       362625       2954       1853
5000        100        22301409       357902       4933       2424

nkeys: number of keys per query
skip:    distance between subsequent keys  ( Keys = lists:seq(1, Skip *
Nkeys, Skip)
pat:     select with key in pattern
lu:       lookup loop
set:     ETS type set
ordset:  ETS type ordered_set

Long story short: Follow Dan Gudmundsson suggestion for ETS based
operations, it's much faster, scales better and suffers less from
non-adjacent keys.

On the other hand, the results might be different with distributed
Mnesia and disk-only tables.

/Jacob

On 2/26/21 6:49 PM, Sverker Eriksson wrote:
>
> One thing to be aware of when doing benchmarks on ETS set vs
> ordered_set is that ordered_set may get boosted cache performance if
> the keys are accessed in term order. The same internal tree nodes are
> visited over and over again as term adjacent key are most often close
> to each other in the tree.
>
> So if that is not your typical traffic case, the benchmark should
> access the keys in some random order.
>
>  
>
> /Sverker
>
>  
>
>  
>
> *From:*erlang-questions <erlang-questions-bounces@REDACTED> *On
> Behalf Of *Jacob
> *Sent:* den 26 februari 2021 16:35
> *To:* Dan Gudmundsson <dangud@REDACTED>
> *Cc:* Questions erlang-questions <erlang-questions@REDACTED>
> *Subject:* Re: Performance of mnesia:select/2
>
>  
>
>  
>
> On 2/26/21 3:53 PM, Dan Gudmundsson wrote:
>
>     Interesting, and times do you get for 500 ets lookup on that data?
>
> Oh, I really should have tested that as well:
>
> timer:tc(fun () -> lists:map(fun(K) -> ets:lookup(t, K) end, Keys) end).
>
> * using lookups: [set, {keypos, 2}]:               3148 us
> * using lookup: [ordered_set, {keypos, 2}]:  3423 us
>
> I have repeated the select/pattern runs to have comparable results:
>
> * using pattern, [ordered_set, {keypos, 2}]: 3173 us
> * using pattern, [set, {keypos, 2}]:               8091 us
>
> Note that there was quite a high jitter on the measurements so I have
> compute the mean value of 3 measurements each.
>
> So lookups and pattern/ordered_set were too close to really come to a
> winner here. I had the impression that the variance was slightly
> higher with lookup, but digging into this would require a better
> measurement setup than what I have been using.
>
>  
>
>      
>
>     On Fri, Feb 26, 2021 at 3:12 PM Jacob <jacob01@REDACTED
>     <mailto:jacob01@REDACTED>> wrote:
>
>         Hi,
>
>         assuming that the match spec compiler does clever things with
>         patterns,
>         I'd use select with the following
>
>            MatchExpression = [ {{'_', K, '_'}, [], ['$_']} || K <- Keys ]
>
>         If did some quick measurements with plain ETS, timer:tc and
>         500 keys out
>         of 1000000 table entries and got:
>
>            * using pattern, [ordered_set]: 4532605 us
>            * using pattern, [set]              :  4645525 us
>            * using pattern, [ordered_set, {keypos, 2}]: 3826 us (!!!)
>            * using pattern, [set, {keypos, 2}]: 5714 us (!!!)
>
>            * using guards, [ordered_set]: 12542928 us
>            * using guards, [set]:               12310452 us
>            * using guards, [ordered_set, {keypos, 2}]: 12365477 us
>            * using guards, [set, {keypos, 2}]: 12277839 us
>
>         I have initialised the DB with [ ets:insert(t, {N, N,
>         integer_to_list(N)}) || N <- lists:seq(1, 1000000) ].
>
>         I don't know though, how this will translate to Mnesia, but
>         I'd give
>         select with pattern on the primary key a try.
>
>         /Jacob
>
>
>         On 2/26/21 11:03 AM, Vance Shipley wrote:
>         > If I need to lookup a list of keys which is the better
>         approach? Why?
>         >
>         > Fselect = fun(Keys) ->
>         >         MatchHead = {'_', '$1', '$2'},
>         >         F = fun(Key) ->
>         >                 {'=:=', '$1', Key}
>         >         end,
>         >         MatchConditions = [list_to_tuple(['or' |
>         lists:map(F, Keys)]),
>         >         MatchBody = ['$_'],
>         >         MatchFunction = {MatchHead, MatchConditions, MatchBody},
>         >         MatchExpression = [MatchFunction],
>         >         mnesia:select(Table, MatchExpression)
>         > end,
>         > mnesia:transaction(Fselect, [Keys]).
>         >
>         > Fread = fun F([Key | T], Acc) ->
>         >                 [R] = mnesia:read(Table, Key),
>         >                 F(T, [R | Acc]);
>         >         F([], Acc) ->
>         >                 lists:reverse(Acc)
>         > end,
>         > mnesia:transaction(Fread, [Keys. []]).
>         >
>         >
>         > --
>         >      -Vance
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210226/ad2810c7/attachment.htm>