[erlang-questions] data sharing is outside the semantics of Erlang, but it sure is useful

Wed Sep 16 08:20:39 CEST 2009

I'm a curious about how the subject line on this thread, which seems to
me too-easily generalized from the very specific problem James brings
up: saving space when you have a very long list of strings, with some
strings repeated.

If you're representing items of data of any kind in a very long list, I
can only assume it's because the cost of linear access is a non-issue
for your application.  If you want to save space by storing only once
the repeated elements in a long, mostly-serial-access list, well, that
reminds of very much of the very general idea of "data compression",
which is also usually used when linear-search access is not much of an
issue, when memory is an issue, and when the data features significant
repetition.

So why not just use compression, if saving space is an important goal and
reducing random-access time is not?

Not sure that I'm totally sold on the Erlang Way of doing things, being
pretty new to the language.  But in this particular case (or even for
generalizations of it), I don't see why the Erlang Way (insofar as I
understand it) is necessarily inferior to anything requiring that the
language break with its general shared-nothing approach.  Did I miss
something?

-michael turner

On 9/15/2009, "James Hague" <james.hague@REDACTED> wrote:

>> Sounds like you want "hash consing".
>
>Hash consing is a heavyweight solution.  It's got a fixed cost for
>something that's usually irrelevant.  What I really want is a function
>that takes a data structure and returns a new version with maximal
>sharing.  I can write special case versions of that in Erlang, but
>it's messy and feels like something that should be a general library
>service.
>
>________________________________________________________________
>erlang-questions mailing list. See http://www.erlang.org/faq.html
>erlang-questions (at) erlang.org
>
>