Cost of Copy
Henning Diedrich
hd2010@REDACTED
Sat May 8 09:43:00 CEST 2010
1
I am trying to get a better picture of what the performance costs are
when mutating (collections of) complex structures.
Most generally, if I have a list of structures and want to make one
change to one of the structures; and if I want to avoid ETS for the
implicit copying, what else should I NOT do to not run into the same
problem with a non-ETS approach, and how CAN I have as little
unnecessary internal copying of all those things that stay the same.
I just ran into this post by J Louis while looking for information
http://jlouisramblings.blogspot.com/2009/01/common-erlang-misconceptions.html
It was giving some implicit answers about what results in internal
copying, what doesn't. And I am not talking about strings here.
I wondered if there is a dedicated write up out there that explains a
little more to the point what the use of lists, sets, trees, ets and
mnesia implies in terms of copying, to better understand the performance
implications. With immutable variables, of course there is always the
hope that underneath, not much copying is actually necessary, as I
understand e.g. lists are benefiting hugely from, to become O(1) when
adding heads to them. So assumptions based on ALGOL heritage experiences
should fail. Which puts one on a quest like Max Lapshin's
http://erlang.2086793.n4.nabble.com/Help-with-storing-data-in-memory-td2119208.html#a2119208
who is also really looking for a way to avoid implicit copying.
I have been looking for clues in this direction all the time but found
them rather scarce in the docs. May be me. May be an idea to put more
focus on in erldocs, erlang.org etc? I am sure there are links. Maybe
they could be featured more prominently. I am sure there are many people
coming to Erlang with a deeply ingrained eye on performance and would
like to understand these implications earlier in the learning experience.
I mean things exactly like
"changing status from x to y would copy the record, BUT the actual
record structure is only a set of pointers." by Ulf Wiger in his post
http://www.erlang.org/cgi-bin/ezmlm-cgi/4/1342 (var names changed)
2
From that post I understand that the best performance should come from
a data structure that is as nested as possible, because that will reduce
the amount of pointers that are copied with every mutation?!
As the mutation copies not-changed data only on one level. That's a
very interesting implication that so far I haven't found spelled out
anywhere (but would have liked to learn up front, if true). I am still
not sure if it would hold across the board. Or is it wrong for other
reasons I am not aware of? (Or could this post maybe for any strange
reason be ... outdated? It's a decade old in June. Wow!)
3
Surveying the choices, does this imply I am to use a copy-heavy approach
if I want to have meaningful transaction support between processes?
Because the choice there is ets, dets or mnesia and they all copy the
entire structure everytime there is any access to it?
4
I would encourage a view that these kind of questions would be best
discussed quite early in the learning curve. Together with the
implications that immutable state has for message passing and how it
saves so much copying in that case. Or so I speculate. I am not speaking
about the error prevention aspects here.
That said, thanks for all the efforts that are going into making the
Erlang docs ever better! Even Google today adopted the classic format of
the Erlang doc frame layout I realize. But seriously, thanks!
Henning
More information about the erlang-questions
mailing list