[erlang-questions] data sharing is outside the semantics of Erlang, but it sure is useful

Thu Sep 17 01:48:20 CEST 2009

On Sep 17, 2009, at 4:43 AM, Jayson Vantuyl wrote:
> I don't really think that it's useful to classify data sharing as  
> data compression.

It's very much in the spirit of dictionary-based compression.

>  Bottom line, there's an optimization (and a clearly important one)  
> that Erlang isn't doing.

And can't reasonably be *expected* to do.  It is reasonable
to expect Erlang to *preserve* sharing, as when sending a term
to another process, because failing to do so can make space use
blow up in a rather sickening way which it's hard for a
programmer to detect.

I sometimes think that for every use case there is an equal
and opposite use case.  In the case of memory, for example,
we've got *space* issues and *cache* issues.  Looking for
existing copies of stuff can save you space, but it can
do terrible things to your cache (bringing in stuff that it
turns out you don't want).  The tradeoffs depend on how much
space you may save, how likely the saving is, and how well you
can avoid looking at irrelevant stuff while looking for an
existing copy.  The programmer is in a better position to know
these things than the Erlang compiler or runtime system.

One thing I didn't quite understand was why the original data
source is emitting stuff with lots of duplication in the first
place.  Fixing the duplication problem at the source has the
added benefit of reducing the cost of getting the data into an
Erlang process to start with.