data sharing is outside the semantics of Erlang, but it sure is useful

Mon Sep 14 18:22:04 CEST 2009

I've run into several cases where enforcing the sharing of data
resulted in a significant memory savings.  I'm talking about a
reduction in heap size from 60MB to under half that. By "enforcing the
sharing of data" I mean making sure that identical elements in a data
structure are actually referencing the same locations in memory.

This is easy to do in Erlang, because the compiler is very literal:

   fix_tuple({H, H}) -> {H, H};
   ...

That ensures that identical looking elements in the tuple are sharing
memory locations.  But there is absolutely no reason the compiler has
to do this.  It would be perfectly valid to optimize away the entire
function, just returning the original value.

Would any existing standard library functions make this nicer?  What I
really want is to have a gb_trees:insert function that returns
{NewTree, InsertedValue} where InsertedValue references existing data
(unless it wasn't already in the tree; in that case, InsertedValue is
exactly what I passed in).  Then I can happily use InsertedValue,
knowing data is being shared.

James