Performance of dict and ets

Bjorn Gustavsson bjorn@REDACTED
Sat Dec 2 17:31:36 CET 2000


Sean Hinde <Sean.Hinde@REDACTED> writes:

> Hi,
> 
> I've been doing some performance comparison between the new R7B dict module
> and ets which I thought I'd share.
> 
> My benchmark program creates the table/dict, stores 20,000 elements in it
> with keys 1..20000, then runs through all keys again and reads then deletes
> each entry.
> 
> If I store lists of varying lengths I get the following results (microsecs
> per op on a 167MHz SPARC)
> 	
> list length	ets_rd_del	ets_write	ets_tot	dict_rd_del
> dict_write	Dict_tot
> 1	13	12	25	54	62	117
> 100	43	30	73	55	62	117
> 200	71	42	113	54	64	119
> 300	102	60	162	55	62	117
> 400	140	73	213	55	63	118
> 
> As expected dict is the same speed across the board, but ets shows linear
> growth with the size of the term being stored.
> 
> Storing any less than 20,000 rows doesn't seem to make much difference.
> 
> In the write case the breakeven is at about 300 elements in the list, in the
> read/delete case at about 130 elements. Overall breakeven is at 230 approx
> 
> If I store a binary it becomes a bit more interesting...
> 
> The dict module performs identically to it's lists case.
> 
> The ets module gives the startling result that almost irrespective of the
> size of the binary each op takes only around 14 microsecs. This seems to
> imply that storing a binary in an ets table involves no copying...

Yes, see below.

> 
> My first guess is that the copying involved in storing something in an ets
> table is an instantaneous term_to_binary type operation rather than anything
> inherent in transferring data to ets???
> 
> Can anyone explain this ets copying business fully?

Terms in an ets table are in the same term format as on process heap,
not in the external term format. Copying a term to an ets table takes
the same time as copying a term to another process. Copying a term
out of a an ets table should be slighly faster, because we use a
special copy routine that depends on the term being stored in a continous
memory block.

Binaries are not copied. Binaries are referenced-counted memory chunks
outside all process heaps. There is only a small reference to it stored
on the process heap. When you store a binary in an ets table, only the
reference to the binary is copied. We call this type of binary refc
binaries.

In R7, small binaries (less than 64 bytes) are usually stored on the
heap and copied when sent or stored in ets tables.
We call these binaries heap binaries. We expect heap binaries to
be faster than refc binaries, because there is less book-keeping,
both when creating binaries and in garbage collection.

There are also sub binaries (also new in R7), which are created when
you split binaries either using the bit syntax or by the calling the
split_binary BIF. A sub binary contains a reference to either a refc binary
or a heap binary, and an offset and a length to specify part of the
original binary.

> 
> Thanks,
> 
> - Sean

/Björn

-- 
Björn Gustavsson            Ericsson Utvecklings AB
bjorn@REDACTED      ÄT2/UAB/F/P
			    BOX 1505
+46 8 727 56 87 	    125 25 Älvsjö



More information about the erlang-questions mailing list