Performance of dict and ets
Bjorn Gustavsson
bjorn@REDACTED
Sat Dec 2 17:31:36 CET 2000
Sean Hinde <Sean.Hinde@REDACTED> writes:
> Hi,
>
> I've been doing some performance comparison between the new R7B dict module
> and ets which I thought I'd share.
>
> My benchmark program creates the table/dict, stores 20,000 elements in it
> with keys 1..20000, then runs through all keys again and reads then deletes
> each entry.
>
> If I store lists of varying lengths I get the following results (microsecs
> per op on a 167MHz SPARC)
>
> list length ets_rd_del ets_write ets_tot dict_rd_del
> dict_write Dict_tot
> 1 13 12 25 54 62 117
> 100 43 30 73 55 62 117
> 200 71 42 113 54 64 119
> 300 102 60 162 55 62 117
> 400 140 73 213 55 63 118
>
> As expected dict is the same speed across the board, but ets shows linear
> growth with the size of the term being stored.
>
> Storing any less than 20,000 rows doesn't seem to make much difference.
>
> In the write case the breakeven is at about 300 elements in the list, in the
> read/delete case at about 130 elements. Overall breakeven is at 230 approx
>
> If I store a binary it becomes a bit more interesting...
>
> The dict module performs identically to it's lists case.
>
> The ets module gives the startling result that almost irrespective of the
> size of the binary each op takes only around 14 microsecs. This seems to
> imply that storing a binary in an ets table involves no copying...
Yes, see below.
>
> My first guess is that the copying involved in storing something in an ets
> table is an instantaneous term_to_binary type operation rather than anything
> inherent in transferring data to ets???
>
> Can anyone explain this ets copying business fully?
Terms in an ets table are in the same term format as on process heap,
not in the external term format. Copying a term to an ets table takes
the same time as copying a term to another process. Copying a term
out of a an ets table should be slighly faster, because we use a
special copy routine that depends on the term being stored in a continous
memory block.
Binaries are not copied. Binaries are referenced-counted memory chunks
outside all process heaps. There is only a small reference to it stored
on the process heap. When you store a binary in an ets table, only the
reference to the binary is copied. We call this type of binary refc
binaries.
In R7, small binaries (less than 64 bytes) are usually stored on the
heap and copied when sent or stored in ets tables.
We call these binaries heap binaries. We expect heap binaries to
be faster than refc binaries, because there is less book-keeping,
both when creating binaries and in garbage collection.
There are also sub binaries (also new in R7), which are created when
you split binaries either using the bit syntax or by the calling the
split_binary BIF. A sub binary contains a reference to either a refc binary
or a heap binary, and an offset and a length to specify part of the
original binary.
>
> Thanks,
>
> - Sean
/Björn
--
Björn Gustavsson Ericsson Utvecklings AB
bjorn@REDACTED ÄT2/UAB/F/P
BOX 1505
+46 8 727 56 87 125 25 Älvsjö
More information about the erlang-questions
mailing list