[erlang-questions] Cost of Copy
Henning Diedrich
hd2010@REDACTED
Sun May 30 02:40:06 CEST 2010
Thanks Robert V, Robert R, James, Richard, thanks Dmitry:
I did a naive benchmark on this meanwhile:
http://www.eonblast.com/blog/cost-of-mutation/
From what I understood now, there is a clear and simple answer to my
basic question: prefer deeper nesting over more flat structures.
The benchmark compares structures like
tupelMutateFlat_10(FlatTupel, NewD) ->
{ A, B, C, _, E, F, G, H, I, J } = FlatTupel,
{ A, B, C, NewD, E, F, G, H, I, J }.
tupelMutateNested_10(NestedTupel, NewD) ->
{ { AB, {C, _}}, EFGH, IJ } = NestedTupel,
{ { AB, {C, NewD}}, EFGH, IJ }.
Any comment on erroneous benchmark construction highly welcome.
Specifically in answer to Robert Virding:
> If you modify one of the nodes then
> the only other nodes you will have to rewrite are those nodes between
> the root of the tree and the modified node.
Plus the sibling nodes, I think, which were my focus.
> What sized nodes depends entirely upon what you are doing, the only
> general hint I can give is use that which fits best into your
> application. I mean it, really. Don't go complicating matters by
> picking weirdo data structures.
>
I was asking this general question to have guidance on the basic
decisions in application and data design.
> The only time I would seriously start thinking about what size thing
> to use is if you are writing more general library type packages.
>
Yes. Plus, I think, what structure the data should have that you write
away to disk would equally affect an app performance so profoundly that
it may warrant this kind of reflection.
> Generally having smaller nodes but a deeper resulting tree will lead
> to less rewriting, but as the tree is deeper it will cost more to
> traverse it and though the rewrites will be smaller there will be more
> of them.
How strong that effect is, that is what I test in the above benchmark.
In the extreme, it gets pretty high:
100 element flat list const Bin (discarding result, 1M ): run
1450 wall 1512 ms.
100 element flat list integer N (discarding result, 1M ): run
1470 wall 1550 ms.
100 element nested list const Bin (discarding result, 1M ): run
400 wall 462 ms. (72% faster)
100 element nested list integer N (discarding result, 1M ): run
380 wall 414 ms. (74% faster)
Let me know please if you find where the assumptions are wrong.
On the traversal statement, Rob, I can't immediately follow --- maybe
you mean with a lot larger structures? I am talking pretty much about a
usual data base record size, say, of a customer with a lot of associated
data attached.
Thanks,
Henning
More information about the erlang-questions
mailing list