[erlang-questions] gen_server with a dict vs mnesia table vs ets

Garrett Smith g@REDACTED
Fri Jan 29 17:24:10 CET 2010


I suspect this is a pretty common set of requirements -- I have
something similar for an app. I currently use a dict in gen_server
state.

I was curious about the actual performance of the non-mnesia scenarios
(dict, gb_trees, and private ets) so ran some tests. The module is
below. The timings are from timer:tc.

The standout result for me is that dict insert performance starts to
trail off non-linearly and gets obviously "costly" somewhere between
100K and 1M values. Of course this is totally non-scientific and I'll
attempt no explanation for this :)

My personal preference is to use gen_server state for this type of
requirement if I can get away with it. It looks like for smallish
record sets (<100K) dict is perfectly fine, assuming your latency
requirements aren't measured in microseconds.

gb_trees appears to scale better than dict and allows you to "keep
things simple" by using gen_server state. So, perhaps it's suitable
for record sets in the millions, though not sure if there's a
performance tipping point with gb_trees as there appears to be with
dict.

ETS looks like it would be the struct of choice for larger record sets
-- millions and beyond.

But to heartily reiterate what others have said in this thread, the
best approach to resolve this issue is to measure per application.
There are too many variables involved to apply generalized performance
characteristics to any given problem. I.e. the numbers below are
merely curiosities, IMO.

Garrett

{{{
-module(test_dict).

-export([go_dict/1, go_ets/1, go_gbtree/1]).

%%
%% Timings (microseconds):
%%
%% go_dict(100) : 450
%% go_dict(1000) : 5000
%% go_dict(10000) : 120000
%% go_dict(100000) : 2100000
%% go_dict(1000000) : 176000000

go_dict(N) ->
    go_dict(dict:new(), N).

go_dict(D, 0) -> D;
go_dict(D, N) ->
    Key = lists:concat(["key-", N]),
    Val = {state, 12345, "this is a string", [1,2,3,4], 45.6789, atom1, atom2},
    go_dict(dict:store(Key, Val, D), N-1).

%%
%% Timings (microseconds):
%%
%% go_ets(100) : 240
%% go_ets(1000) : 2400
%% go_ets(10000) : 35000
%% go_ets(100000) : 460000
%% go_ets(1000000) : 6000000

go_ets(N) ->
    go_ets(ets:new(ets_test, [private]), N).

go_ets(Ets, 0) ->
    Ets;
go_ets(Ets, N) ->
    Key = lists:concat(["key-", N]),
    Val = {state, 12345, "this is a string", [1,2,3,4], 45.6789, atom1, atom2},
    ets:insert(Ets, {Key, Val}),
    go_ets(Ets, N - 1).

%%
%% Timings (microseconds):
%%
%% go_gbtree(100) : 450
%% go_gbtree(1000) : 7000
%% go_gbtree(10000) : 140000
%% go_gbtree(100000) : 2000000
%% go_gbtree(1000000) : 25000000

go_gbtree(N) ->
    go_gbtree(gb_trees:empty(), N).

go_gbtree(Tree, 0) ->
    Tree;
go_gbtree(Tree, N) ->
    Key = lists:concat(["key-", N]),
    Val = {state, 12345, "this is a string", [1,2,3,4], 45.6789, atom1, atom2},
    go_gbtree(gb_trees:insert(Key, Val, Tree), N - 1).
}}}

On Thu, Jan 28, 2010 at 4:36 PM, Jayson Vantuyl <kagato@REDACTED> wrote:
> Use ETS, managed by a gen_server.  The gen_server will serialize all operations, so there will be no concurrency against the ETS table.
>
> If you need to scale this further, you might have one table and gen_server per message type, or even more split up by a hash on the user_id.  When splitting, if your need for an atomic update is global across processes, you can either collect them as batches in another process or use a gen_fsm to temporarily lock them all, then flush them, then unlock them.
>
> While ETS does cause some extra copying, for what it is good at it can be blazingly fast.  Just test it, as variations between systems make it nearly impossible to say which is better without actual testing.  eprof and fprof are your friends.
>
> Or you could just use Mnesia, as you can use its transactions to get your atomicity.
>
> On Jan 28, 2010, at 1:22 PM, Pablo Platt wrote:
>
>> @Robert
>>
>> My use case is simple:
>> - a list of key/value records ({user_id, msg_type}, msg_body)
>> - several processes needs to create/update records.
>> - one process needs to get all the records and clear the list in an 'atomic' operation once per 1 minute.
>> - number of records per minutes expected to be <1K at start.
>> - No need for replication/distribution. The list will be only in memory.
>>
>>
>>
>> ________________________________
>> From: Robert Virding <rvirding@REDACTED>
>> To: Pablo Platt <pablo.platt@REDACTED>
>> Cc: Max Lapshin <max.lapshin@REDACTED>; erlang-questions@REDACTED
>> Sent: Thu, January 28, 2010 5:44:13 PM
>> Subject: Re: [erlang-questions] gen_server with a dict vs mnesia table vs ets
>>
>> It really depends very much on your app which is better:
>>
>> - An ETS table will generally allow you to hold more data.
>> - An ETS table is external to processes so there is no cost in process GC.
>> - BUT there is still an ETS data GC cost every time you add or remove data.
>> - Since ETS data not in process there are copying costs every time you
>> access table. This can make some operations very expensive, but
>> match_object and select_object can help alot.
>> - A dict allows easy roll back to previous state if you keep old reference.
>> - ETS and dicts provide slightly different interfaces.
>>
>> You could use a public ETS table, but this would not allow for more
>> complex atomic transactions and is not accessible over distribution.
>>
>> It really does depend on what you are doing. The best is to test it
>> with realistic data amounts and operations. As an alternative to dicts
>> there are gb_trees which are also in the process memory but have
>> different properties compared to dicts.
>>
>> Robert
>>
>> 2010/1/28 Pablo Platt <pablo.platt@REDACTED>:
>>> So I'll use a gen_server that controls the ETS table with private access.
>>> Thanks
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Max Lapshin <max.lapshin@REDACTED>
>>> To: Pablo Platt <pablo.platt@REDACTED>
>>> Cc: erlang-questions@REDACTED
>>> Sent: Thu, January 28, 2010 3:29:48 PM
>>> Subject: Re: [erlang-questions] gen_server with a dict vs mnesia table vs ets
>>>
>>> On Thu, Jan 28, 2010 at 4:28 PM, Pablo Platt <pablo.platt@REDACTED> wrote:
>>>> The fact that ETS doesn't take part in garbage collection is a good or bad
>>>> feature in my case?
>>>
>>> Good, of course: you can control by yourself, when to clean objects,
>>> so there will be no GC-penalty on each loop
>>>
>>> ________________________________________________________________
>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>> erlang-questions (at) erlang.org
>>>
>>>
>>>
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>>
>
> --
> Jayson Vantuyl
> kagato@REDACTED
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>


More information about the erlang-questions mailing list