[erlang-questions] data sharing is outside the semantics of Erlang, but it sure is useful
Richard O'Keefe
ok@REDACTED
Thu Sep 17 07:52:48 CEST 2009
On Sep 17, 2009, at 1:02 PM, Jayson Vantuyl wrote:
>I've run into this when working with a simple graph algorithm.
>Representing edges as {source,dest} was great for atoms and
>horrible for strings. All of my tests used atoms, but at runtime,
>the strings were being duplicated (because I was messaging them
>around). It was noticeable.
This sounds to me like a perfect example where duplication
should be avoided at the source. Graphs should be sent as
{graph,{NodeNames},[{F1,T1},...,{Fn,Tn}]}
where the Fi and Ti are indices into the {NodeNames} tuple.
>
> Another problem I had was with a backend for the Linux Network Block
> Device. I was tossing around disk blocks (4k binaries) and had
> pathological memory usage really quickly.
>
> Real development has real problems with unnecessary data
> duplication. This is not a matter of optimization. Someone needs
> to finish one of the alternate heap implementations. Really.
There seem to be two issues confused here.
One of them is the fact that when you send a message,
all sharing within the message is removed (except that
large binaries are not supposed to be copied).
We *agree* that this is a bad thing. My message was explicit
that Erlang should preserve sharing.
But it didn't sound as though that's what the original poster
was talking about. I may well have misunderstood; it would
not be the first time.
It's claimed that preserving sharing would raise the cost of
message sending too high. There's an answer to that. Set a
modest threshold, say 100 cells or so, and try the existing
way of sending. But if that threshold is crossed, give up,
and start over with a method that preserves within-message
sharing.
More information about the erlang-questions
mailing list