[erlang-questions] mnesia:transform_table

Thu May 27 04:25:52 CEST 2010

On Wed, May 26, 2010 at 5:11 PM, Robert Virding <rvirding@REDACTED> wrote:
> On 26 May 2010 18:42, Igor Ribeiro Sucupira <igorrs@REDACTED> wrote:
>> Since a dict consumes more space, the advantage of the proplist is the
>> size. Basically, the dict will give you a performance gain that's
>> irrelevant (unless you have many columns) and use more space.
>
> In that case you might as well use an orddict which is generally

Interesting... I hadn't noticed that an orddict is basically a
proplist sorted by "keys".

Since it's a list (and not an imperative array), I suppose you can't
use binary search and should still need linear time (in the worst
case) both to find and to insert/update an element, with constant time
in the best case. And those are also the bounds for "normal"
proplists.
If that is the case, it seems to me that an orddict will only be more
efficient (in practice) than a proplist for search misses, since on a
proplist you always need to search the whole list to decide on a
search miss.
For most applications, I think search misses for fields on a database
will not be very common, so maybe it doesn't make much difference in
the end if you use an orddict or an unsorted proplist...

> faster than an unsorted list, though not as fast as dict or gb_trees.
> You will find that the relative size difference between
> orddict/proplist and gb_trees decreases as the number of elements
> grows.
>
>> For example:
>>
>> 1> List = [{"firstname", "Igor"}, {"lastname", "Sucupira"}, {"gender",
>> "male"}, {"birthdate", "02/15/1981"}].
>> [{"firstname","Igor"},
>>  {"lastname","Sucupira"},
>>  {"gender","male"},
>>  {"birthdate","02/15/1981"}]
>> 2> Dict= dict:from_list(List).
>> {dict,4,16,16,8,80,48,
>>      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>      {{[["lastname",83,117,99,117,112,105,114,97],
>>         ["gender",109,97,108,101]],
>>        [],
>>        [["birthdate",48,50,47,49,53,47,49,57,56,49]],
>>        [],[],[],[],[],[],[],[],[],
>>        [["firstname",73,103,111,114]],
>>        [],[],[]}}}
>> 3> GB = gb_trees:from_orddict(orddict:from_list(List)).
>> {4,
>>  {"gender","male",
>>  {"firstname","Igor",{"birthdate","02/15/1981",nil,nil},nil},
>>  {"lastname","Sucupira",nil,nil}}}
>>
>> 4> erlang:byte_size(term_to_binary(List)).
>> 97
>> 5> erlang:byte_size(term_to_binary(Dict)).
>> 195
>> 6> erlang:byte_size(term_to_binary(GB)).
>> 125
>>
>> 8> timer:tc(lists, keysearch, ["firstname", 1, List]).
>> {2,{value,{"firstname","Igor"}}}
>> 9> timer:tc(gb_trees, lookup, ["firstname", GB]).
>> {3,{value,"Igor"}}
>> 10> timer:tc(dict, find, ["firstname", Dict]).
>> {4,{ok,"Igor"}}
>> 11> timer:tc(lists, keysearch, ["firstname", 1, List]).
>> {3,{value,{"firstname","Igor"}}}
>> 12> timer:tc(gb_trees, lookup, ["firstname", GB]).
>> {3,{value,"Igor"}}
>> 13> timer:tc(dict, find, ["firstname", Dict]).
>> {4,{ok,"Igor"}}
>
> These times are really too small to hope to show anything when

Yeah... I know. I thought I had already made my point before (that any
difference would be irrelevant if the number of fields is small,
considering you already had to search and retrieve the list of fields
from the database) and didn't make an effort to provide adequate
evidence.  ;-)

It may also be a good idea to compile the tests you run, if they are
more complex than those above, as Robert made me learn in the thread
below.  :-)
http://forum.trapexit.org/viewtopic.php?p=49172&sid=5b490b5c8ebee8caadbfc0c5b808c07d

Best regards.
Igor.

> comparing them. You should run the tests 10k times for each one to get
> some relevant figures.
>
> How many columns do you think you will be using?
>
> Robert