[erlang-questions] Cost of Copy

Fri May 28 10:15:38 CEST 2010

On Thu, 27 May 2010 22:09:21 +0200, Henning Diedrich <hd2010@REDACTED>  
wrote:

> A = { { B, C }, { D, E },  {F, G, H } }
> because if I change e.g. B, only { B', C } will be created as new
> internal list of pointers, as opposed to the whole of { B', C, D, E, F,
> G, H }. Which should for larger lists and tuples, and frequent changes,
> start to make a difference?

Not exactly.  Lets say we have A as above and want to construct a new tuple

X = {{B', C}, {D, E},  {F, G, H }}

What we then have to do is construct a new tuple with three elements

X = {  _ , _ , _}

We then have to construct a new tuple for {B', C}

X = {{_, _}, _, _}

and then we of course have to construct B'

X = {{B',_}, _,_}

so we have constructed two tuples, one of size tree and one of size two  
(plus B'), all other structurs C, {D,E} and {F,G,H} are shared.

If we instead have

A = {B, C, D, E, F, G, H }

and want to construct

X = {B', C, D, E, F, G, H}

we construct one new tuple

X = {_, _, _ , _, _, _, _}

and then B'

X = {B', _, _ , _, _, _, _}

so now we have only constructed one tuple of size seven. All other  
structures C,D,E,F,G,H are shared.

The difference is in this case marginal (and I don't know which one  
executes faster). If we look at larger structures the difference become  
apparent. Assume you have a N elements in a set and want to update these  
element one by one. One way is to represent this a tuple of size N. Each  
update operation would then construct a new tuple of size N. All elements  
in the tuple would be shared but the new tuple has one element that  
differs.

If we instead represent this as a list of length N we would have to  
construct a new list to the point where we have located the element that  
we want to change. Assume we want to change E:

A = [B | [C | [D | [E | [F | [G | [H| []]]]]]]]

X = [_ | [_ | [_ | [E' | _]]]]

Depending in how far down E is found we have to construct new list cells  
for all element before E. In the example above this is four list cells  
[_|_].

In average thins means N/2 new list cells for each update. This is for  
large N more efficient than having to construct a new tuple of size N.

A even better way, if N is large, is to construct a tree.

A = {tree,
       {tree,
          {tree, {leaf, B}, {leaf, C}},
          {tree, {leaf, D}, {leaf, E}}}
       {tree,
          {tree, {leaf, F}, {leaf, G}},
          {leaf, H}}}

Now if we want to change E then we have

A = {tree,
       {tree,
          _,
          {tree, _, {leaf, E'}}}
       _}

So we construct new tree tuples all the way down to the location of E. All  
other structures are shared. In general this means that we will construct  
lg(N) new tree tuples.

So a tuple of size N, N/2 new list cells or lg(N) new tree tuples.

The data structures of course have other implications when it comes to  
reading and adding elements.

  Johan

-- 
Associate Professor Johan Montelius
Royal Institute of Technology - KTH
School of Information and Communication Technology - ICT