[erlang-questions] gb_trees and dict performance

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Wed Dec 9 15:31:46 CET 2015


On Wed, Dec 9, 2015 at 2:00 PM, Vladimir Ralev <vladimir.ralev@REDACTED>
wrote:

> I am curious about the O-complexity of the operations in these data
> structures in Erlang.
>

maps(), gb_trees and dict are all O(lg n) in complexity for
insertion/lookup. They all implement trie/tree-like structures, and this
allows them to exploit a nice observation: in an immutable data structure,
we don't need to copy the whole structure. We can simply re-use the parts
of the tree which has *not* changed.

Thus, insertion will only replace the "path" from the inserted node to the
top-level with the remaining parts of the tree unchanged. In a simple
binary tree with {node, Left, X, Right}, we could insert on the Right and
once the recursion returns, we would store {node, Left, X, NewRight}. Note
that Left never changed, so we can reuse that pointer in the newly formed
tree. Another way of seeing it is like Git refers to older unchanged files
in newer commits when possible, thus avoiding copying the files.

The dict module uses a far wider branching factor than gb_trees. It trades
faster lookup times for more expensive insertions and more memory pressure.
The maps() use a Hash-Array-Mapped-Trie structure (Phil Bagwell, Ideal Hash
trees) which is among the most advanced structures we know. The structure
is a blend of a hash-table and a mapped trie, getting good properties from
both. Access is often very few memory reads (on the order of 5 for 4
billion elements), and insertion is fast - while at the same time being
memory efficient and persistent. It is the same data structure Clojure uses
for its map construction.

In general, the persistence property can be exploited in Erlang programs.
Some times it is *more* efficient than manipulation of data through
ephemeral (destructive) updates. Due to immutability, you can pass large
data structures as pointers without worrying about the caller changing
them. It also removes the notion of pass-by-value/pass-by-reference: every
data structure is passed by value, since there is no way to change it. But
the runtime will often pass by reference automatically if beneficial :)


-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20151209/a4874291/attachment.htm>


More information about the erlang-questions mailing list