[erlang-questions] keys for mnesia

Steve Davis steven.charles.davis@REDACTED
Tue Jan 12 12:49:52 CET 2010


Hi Ulf, Igor,

I was trending in my thinking toward binaries as a starting point; and 
it seems I need to revisit the efficiency guide.

I was curious as to whether there would be a definitive answer, and so, 
Ulf, your guidance on the considerations to apply in this general 
situation is much appreciated.

In the case of this particular table it's unlikely that performance 
would be an issue, there is value in this instance for using binaries 
for key parsing with binary matching in the application.

I will surely take the advice that when I get far enough, I'll measure, 
measure, measure.

Many thanks both for your responses,

Regards,
Steve


Ulf Wiger wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> In this case, you need to differentiate between a ram_copy
> (or disc_copy) table and a disc_only_copy.
> 
> The term_to_binary conversion is not done when storing in
> ets, so there you need to look at the internal format for
> each term:
> 
> - - a 2-tuple with two atoms: 2 words + 2 * (1 word)
> - - a string (list of integers): Length * (2 words)
> - - a binary: (3..6 words) + data
> 
> (See the efficiency guide in the Erlang/OTP documentation).
> 
> But there may be other considerations as well. If you want
> the table to be an ordered set, sorting efficiency becomes an
> issue. I can't say whether it is more efficient to sort
> atoms than it is to sort binaries, as in both cases, you
> must retrieve the actual content and compare.
> 
> You may also want to consider copying cost. Strings are copied
> in full, even when storing in ets, whereas for atoms, only the
> pointer is copied. Binaries are normally not copied (unless
> they are small enough that it doesn't matter anyway).
> 
> Bottom line is that you either measure on your data, with a
> realistic mix of operations and data sizes, or you assume that
> it won't make that much difference anyway. Unless you are
> dealing with massive amounts of data, very large objects and/or
> have tough performance requirements, I'd go with the latter
> until reality intervenes and proves the assumption wrong.
> 
> BR,
> Ulf W
> 
> Igor Ribeiro Sucupira wrote:
>> Hi.
>>
>> I'd also appreciate an answer to that question.  :)
>>
>> My guess is that the more space-efficient would be the smallest in
>> binary format. In that case, your second option would be the best:
>>
>> 1> BS = fun(Term) -> erlang:byte_size(term_to_binary(Term)) end.
>> #Fun<erl_eval.6.13229925>
>> 2> BS({myapp, myattr}).
>> 20
>> 3> BS("myapp.myattr").
>> 16
>> 4> BS(<<"myapp.myattr">>).
>> 18
>>
>>
>> But I'm not sure.
>>
>> Best regards.
>> Igor.
>>
>> On Mon, Jan 11, 2010 at 11:16 PM, Steve Davis
>> <steven.charles.davis@REDACTED> wrote:
>>> Which would be more efficient/recommended to use as the key of a k/v
>>> pair for an mnesia table:
>>>
>>> {myapp, myattr}
>>>
>>> or
>>>
>>> "myapp.myattr"
>>>
>>> or
>>>
>>> <<"myapp.myattr">>
>>>
>>> ?
>>>
>>> BR
>>> /s
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iEYEARECAAYFAktMLgkACgkQtqqFieqzed3fIwCg9jhXqSPAsFKLmkFsvvWMaWnk
> WNkAoPQuByJudBlN7J3HH4SkXGGL6DMK
> =Rt72
> -----END PGP SIGNATURE-----
> ---------------------------------------------------
> 
> ---------------------------------------------------
> 
> WE'VE CHANGED NAMES!
> 
> Since January 1st 2010 Erlang Training and Consulting Ltd. has become ERLANG SOLUTIONS LTD.
> 
> www.erlang-solutions.com
> 
> 



More information about the erlang-questions mailing list