<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 9, 2015 at 2:00 PM, Vladimir Ralev <span dir="ltr"><<a href="mailto:vladimir.ralev@gmail.com" target="_blank">vladimir.ralev@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":kn" class="a3s" style="overflow:hidden">I am curious about the O-complexity of the operations in these data<br>
structures in Erlang.</div></blockquote></div><br></div><div class="gmail_extra">maps(), gb_trees and dict are all O(lg n) in complexity for insertion/lookup. They all implement trie/tree-like structures, and this allows them to exploit a nice observation: in an immutable data structure, we don't need to copy the whole structure. We can simply re-use the parts of the tree which has *not* changed. <br><br>Thus, insertion will only replace the "path" from the inserted node to the top-level with the remaining parts of the tree unchanged. In a simple binary tree with {node, Left, X, Right}, we could insert on the Right and once the recursion returns, we would store {node, Left, X, NewRight}. Note that Left never changed, so we can reuse that pointer in the newly formed tree. Another way of seeing it is like Git refers to older unchanged files in newer commits when possible, thus avoiding copying the files.<br><br></div><div class="gmail_extra">The dict module uses a far wider branching factor than gb_trees. It trades faster lookup times for more expensive insertions and more memory pressure. The maps() use a Hash-Array-Mapped-Trie structure (Phil Bagwell, Ideal Hash trees) which is among the most advanced structures we know. The structure is a blend of a hash-table and a mapped trie, getting good properties from both. Access is often very few memory reads (on the order of 5 for 4 billion elements), and insertion is fast - while at the same time being memory efficient and persistent. It is the same data structure Clojure uses for its map construction.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">In general, the persistence property can be exploited in Erlang programs. Some times it is *more* efficient than manipulation of data through ephemeral (destructive) updates. Due to immutability, you can pass large data structures as pointers without worrying about the caller changing them. It also removes the notion of pass-by-value/pass-by-reference: every data structure is passed by value, since there is no way to change it. But the runtime will often pass by reference automatically if beneficial :)<br><br clear="all"></div><div class="gmail_extra"><br>-- <br><div class="gmail_signature">J.</div>
</div></div>