[erlang-questions] ets vs list

Mon Sep 13 20:12:32 CEST 2010

On 09/13/2010 07:11 PM, Morten Krogh wrote:
> 
> 
> Thanks. I just tried in the shell and saw that there were no new
> processes after creating an ETS table. fine.
> 
> Surely, you could model ets with a process and message passing. Why
> should it be slower? The only overhead should be the message passing.

No, there are several other overheads. For instance, you must somehow
implement a hash table from immutable dynamically typed data structures,
whereas in ETS (in C), you can use mutable arrays. The dict module,
for example, uses a tree structure of tuples. Once you've located
the leaf tuple where the object is to reside, you have to create
a new copy of that tuple, which means you must also update (copy)
the parent tuple, etc.

Also, when traversing the tree, you cannot avoid the type checks
ensuring that it's actually a tuple you're stepping into. In the
case of C, you will not be performing runtime type checks on the
hash space.

OTOH, with SMP Erlang, you have to protect the ETS tables with
mutexes, whereas on the process heap, you don't have to worry
about contention.

> So, I guess your benchmark might depend on the size of the key/value
> pairs. For large values, the performance should be the same, for small
> values the process might be a bit slower.

In the case I described, the task was to model the ETS API
faithfully, which meant a separate process per table. This also
included copying, as data sent in messages is copied from
process to process. Otherwise, if you use dict, gb_trees, etc.
or the process dictionary, you can avoid the copying and get
quite good performance if the stored data structures are large.

> So, if we compare ets with dict or the process dictionary, then ets
> should perform really badly on large values, it seems.

Yes. OTOH, the process dictionary is difficult to share...

BR,
Ulf W