Performance of dict and ets

Mon Dec 4 16:02:45 CET 2000

Sean Hinde <Sean.Hinde@REDACTED> writes:
>Hi,
>
>I've been doing some performance comparison between the new R7B dict module
>and ets which I thought I'd share.
>...

I did some tests a while back when Richard Carlsson presented his
gb_trees package which gave some very good results compared to dict and
orddict.  As Björn described ets copies so  to do a fair comparison I
make sure I also copy when testing other databases.  These are some
results from my tests.

I run twice through a set of insert, lookup, update if the value is an
integer and finally delete on all the values.  It is interesting to see
what happens the second time when overwriting the old values.

I test ets, dict, orddict, gb_trees, pets (Process ETS, like dict but
with an ets compatible interface) and sysHashTab (early dict/ets like
package from Ulf Wiger, my version couldn't do delete) inserting a tuple
of N atoms.

The times are average microsecs per operation.

             ets       dict      gb_trees  pets
             ---       ----      --------  ----
Size = 13
    Insert   9.50      46.1      125       47.0
    Lookup   3.50      4.24      7.64      4.00
    Insert   5.90      28.0      21.1      31.8
    Lookup   3.81      3.76      5.16      3.85
    Delete   4.80      19.0      5.81      18.8

Size = 26
    Insert   9.30      56.7      153       49.5
    Lookup   6.80      4.64      7.27      3.90
    Insert   7.60      37.9      22.0      43.8
    Lookup   5.96      4.02      5.59      3.82
    Delete   2.75      21.1      6.04      21.7

Size = 52
    Insert   14.5      59.5      121       64.5
    Lookup   7.35      4.05      5.50      3.93
    Insert   5.90      57.1      38.8      65.3
    Lookup   8.22      3.82      5.14      3.87
    Delete   2.82      31.7      10.4      25.5

Gb_trees has different functions for insert/lookup/delete depending on
whether you *KNOW* there is already data with that key.  To be fair we
use a variant which doesn't know for insert/lookup but knows for
delete.  This explains why insertion is much faster the second time,
also why delete is much faster than dict/pets.

We also see that for objects larger than about 20 element tuples
lookup is faster using dict.  However for really big databases you
don't really want to store them in the process heaps because of the
way BEAM heaps are managed.

However, one reason for using dict/pets is the nice meta functions
fold/filter/merge/update which all work without copying.  The ets
versions will copy.

	 Robert

P.S. Pets hasn't been released, but it is internally very similar to
dict.

P.P.S. Orddict has been sped up using some better code I got from
Richard Carlsson, should be in R8.  It is, however, no alternative
with this many elements.