[erlang-questions] Cost of Copy

Thu May 27 22:09:21 CEST 2010

Thanks a lot James!

am I right then to conclude that mutating something in a list will be 
the faster the more the data is spread out deeper over several levels?

A = { B, C, D, E, F, G, H }

will be slower to change than, say

A = { { B, C }, { D, E },  {F, G, H } }

because if I change e.g. B, only { B', C } will be created as new  
internal list of pointers, as opposed to the whole of { B', C, D, E, F, 
G, H }. Which should for larger lists and tuples, and frequent changes, 
start to make a difference?

So this would be the fastes option?

A = { { B }, { C }, { D }, { E },  { F }, { G }, { H } }

And if the tuple may be expected to change in size (elements not only 
changeing but being added/deleted), this:

A = { { { { B }, { C } }, { { D }, { E } } }, { { { F }, { G } }, { { H 
} } } }

Maybe I am wrong that there may be a difference between tuples and 
lists, but is it  the same for

A = [ B, C, D, E, F, G, H ]

 slower to change than

A = [ { B, C }, { D, E },  { F, G, H } ]

? Or are list-elements changed faster, in general, virtually 'in-place' 
as only one pointer is replaced?

Hardly I guess, because the list itself must not be changed? So the 
whole new list must be copied anyway, just like a tuple.

Thanks for your take,
Henning

James Hague wrote:
> If you ignore message passing, then the copying/sharing issues in Erlang are
> just about the same as in Lisp and Scheme, so you may find some information
> that way.
>
> The short version is that some values are represented directly in a memory
> "cell," like atoms and (most) integers. Others are pointers to data stored
> elsewhere (list elements, or conses, which are two consecutive cells in
> memory) or tuples (N consecutive cells plus one for the size). Just by
> looking at code you can see where conses and tuples are created. If you use
> an existing "pointer" to data, then that data is not copied.
>
> Examples:
>
> B = {A, A, A}
> This creates a three element tuple (four memory cells). It doesn't matter if
> A is a list of a million elements or a tuple or the number 2. B just
> contains three cell-sized values (plus the size of the tuple).
>
> X = [57|List]
> This creates a single cons (2 cells). The first element is 57. The second is
> a pointer to List. So even though you've got two lists (X and List), all of
> the data is shared except for the first cons of X.
>
>