[erlang-questions] Re: Shared/Hybrid Heap

Thu Nov 4 11:46:08 CET 2010

Hi Ryan,

My comment was directed to the thread in general, but partially as a comment to your comments. So when I was using "you" it was not specifically you Ryan, but you the reader.

"2. I would assert that the operations on immutable data structures cost less then many would imagine."

I was really trying to make three points:

- Implementing a good shared/hybrid real-time gc is not trivial, and doing it in a multi-core system is even less trivial. Mutable data makes it even less trivial. Yes, it a known problem, but still difficult.

- I think that having immutable data even within only one process is a big win. Having mutable data gives you all the problems of global data. Immutable data makes it much easier to keep track of what is happening with your data, which is always a Big Win.

- Having the data as both mutable and immutable, even if not at the same time, means I have two different ways of working with it and need to know which it is just now. Not being a Clojure person I could very well be wrong in this point, but at least *I* would find it difficult to keep track of which it is.

I don't know if this helped any.

Robert

----- "Ryan Zezeski" <rzezeski@REDACTED> wrote:

> On Wed, Nov 3, 2010 at 8:50 PM, Robert Virding <
> robert.virding@REDACTED > wrote:
> 
> 
> This question is much more complex that you would at first realise.
> Some thoughts (in no specific order):
> 
> 1. Shared/hybrid heaps are wonderful BUT:
> - They are more difficult to implement in a real-time environment.
> Yes, this is a known problem with solutions but the benefits are less
> than you would expect. Real-time gc costs.
> - They don't port easily to multi-core systems as you suddenly start
> needing locks and stuff everywhere to control access or have complex
> schemes to get around this. Which copying doesn't.
> 
> For some problems shared heaps are much better, for others much worse.
> 
> 2. I would assert that the operations on immutable data structures
> cost less then you imagine.
> 
> 3. From the way you describe it having data structures which are both
> mutable and immutable would very confusing. You would need to have two
> sets of operations on them and every application would have to keep
> track of when to use them.
> 
> 4. Mutable data structures give you all the things you want to avoid
> with global data, especially that they can change under your feet
> without you knowing about it. Yes, you may have to pass a reference
> around but it still means that I don't know who can access my bit of
> data and can change it. Yes, you can stipulate that the program should
> be well-behaved, but we all know how well that works in real-life in a
> big system.
> 
> So while I have said that you don't *need* immutable data to get
> process isolation, instead you can specify data copying between
> processes, this does not mean that I recommend having mutable data.
> Immutable data is so much easier to work with and keep track of.
> 
> Robert
> 
> 
> 
> 
> 
> ----- "Tony Arcieri" < tony.arcieri@REDACTED > wrote:
> 
> > I agree with this sentiment as well. A pure immutable system gives
> > more
> > optimization options to VM implementers, but as is without the
> > shared/hybrid
> > heaps Erlang doesn't do a good job of leveraging the potential of a
> > purely
> > immutable state system.
> >
> > On Wed, Nov 3, 2010 at 5:35 PM, Ryan Zezeski < rzezeski@REDACTED >
> > wrote:
> >
> > > On Mon, Oct 18, 2010 at 7:28 AM, Morten Krogh < mk@REDACTED >
> > wrote:
> > > >
> > > >
> > > > On practical terms, I feel that Erlang some times gives up
> > performance
> > > for
> > > > unnecessary reasons. I am especially thinking about mutability.
> > > > There is no reason to give up mutation within a process which is
> > already
> > > a
> > > > serializer.
> > > >
> > > > code like this
> > > >
> > > > loop(Dict)
> > > > receive
> > > > {put, K, V} ->
> > > > Dict2 = dict:store(K, V, Dict),
> > > > loop(Dict2)
> > > > end.
> > > >
> > > >
> > > > is just a performance wise inefficient way of doing
> > > >
> > > > loop(Mut) ->
> > > > receive
> > > > {put, K, V} ->
> > > > put(K,V, Mut),
> > > > loop(Mut)
> > > > end.
> > > >
> > > > where Mut is a mutable data structure.
> > > > In the dict exampe, there is a unncecessary work and garbage
> > collection
> > > for
> > > > the functional data structure Dict.
> > > > And you gain nothing, since it is sequential code.
> > > >
> > > >
> > > I think this is a good point. If you are simply creating a data
> > structure
> > > in _one_ process it should be mutable. Then when you publish it it
> > becomes
> > > immutable. I have Clojure's transients in mind.
> > >
> > > http://clojure.org/transients
> > >
> > > *If a tree falls in the woods, does it make a sound?**If a pure
> > function
> > > mutates some local data in order to produce an immutable return
> > value, is
> > > that ok?*
> > > *
> > > *
> > > *
> > > *
> > >
> >
> >
> >
> > --
> > Tony Arcieri
> > Medioh! A Kudelski Brand
> 
> 
> Robert, I'm not sure if this was directed towards me or the thread in
> general, but I want to clarify my point.
> 
> 
> I completely agree immutable data structures are the way to go. One
> reading of Java Concurrency In Practice was enough to make me shiver
> at shared mutable structures.
> 
> 
> I don't agree that operations on immutable data structures cost less
> than *I* imagine. If that was the case why would Rich Hickey go
> through great lengths to find _performant_ immutable data structures
> that he implemented in Clojure (Bagwell Trees)? Furthermore, why would
> he also create a separate set of functions (transients) for building
> up a data structure in a mutable way and then publishing it to the
> world as an immutable data structure? Plus, the cost of anything is
> relative. Yes, in most cases immutable data structures are more than
> fast enough. But in a few cases maybe they aren't.
> 
> 
> What is wrong about using a mutable data structure that can only ever
> be touched by _one_ and only _one_ process? Then, if this process
> wants to share it it must be converted to an immutable data structure.
> In Clojure this works by using a set of dedicated functions to work
> with the mutable data structure. This data structure will not work
> with the rest of Clojure which expects an immutable thing (or maybe it
> implicitly converts it, I forget). Then when you are ready to share
> this thing with other threads you simply change it to be immutable
> which is a cheap operation. So, in the specific case where only _one_
> thread creates the data structure but then is shared by many you get
> the speed of in-place mutation with the safety of immutability. I'm no
> programming language expert, but this seems like an obvious win.
> 
> 
> To clarify even more, the data structure is not "both mutable and
> immutable." It's mutable by one thread/process and then immutable the
> second it's published (shared) and can no longer be mutated.
> 
> 
> -Ryan