[erlang-questions] Re: data sharing is outside the semantics of Erlang, but it sure is useful

James Hague james.hague@REDACTED
Mon Sep 14 22:36:18 CEST 2009


> I am missing something here. gb_sets (nor sets, ordsets, rbsets) does not
> make a copy of the data which is put into the set. All that is copied is
> enough of the *tree* to insert the new element. There is no need to copy the
> new data as it is kept within the same process. Only ets makes a copy of the
> data.

Let's say you've got a long list of strings.  Many of them duplicates.
You don't just want to remove the duplicates because that will change
the length of the list. The goals is to ensure that identical strings
are shared, so there's only one copy in memory.  What's a practical
way of doing that?

This is irrelevant most of the time, but there are some situations
where it's a huge win.

(My solution was to build a new list by adding each element to a
binary tree.  If a string is already in the tree, return the version
that's already there (which is not something that gb_sets does).  In
the resulting list, elements are shared as much as possible. I'm
clearly taking advantage of how the runtime works, but it shrunk the
heap size by tens of megabytes.)


More information about the erlang-questions mailing list