[erlang-questions] Re: Shared/Hybrid Heap

Thu Nov 4 03:47:05 CET 2010

On Wed, Nov 3, 2010 at 8:50 PM, Robert Virding <
robert.virding@REDACTED> wrote:

> This question is much more complex that you would at first realise. Some
> thoughts (in no specific order):
>
> 1. Shared/hybrid heaps are wonderful BUT:
> - They are more difficult to implement in a real-time environment. Yes,
> this is a known problem with solutions but the benefits are less than you
> would expect. Real-time gc costs.
> - They don't port easily to multi-core systems as you suddenly start
> needing locks and stuff everywhere to control access or have complex schemes
> to get around this. Which copying doesn't.
>
> For some problems shared heaps are much better, for others much worse.
>
> 2. I would assert that the operations on immutable data structures cost
> less then you imagine.
>
> 3. From the way you describe it having data structures which are both
> mutable and immutable would very confusing. You would need to have two sets
> of operations on them and every application would have to keep track of when
> to use them.
>
> 4. Mutable data structures give you all the things you want to avoid with
> global data, especially that they can change under your feet without you
> knowing about it. Yes, you may have to pass a reference around but it still
> means that I don't know who can access my bit of data and can change it.
> Yes, you can stipulate that the program should be well-behaved, but we all
> know how well that works in real-life in a big system.
>
> So while I have said that you don't *need* immutable data to get process
> isolation, instead you can specify data copying between processes, this does
> not mean that I recommend having mutable data. Immutable data is so much
> easier to work with and keep track of.
>
> Robert
>
>
> ----- "Tony Arcieri" <tony.arcieri@REDACTED> wrote:
>
> > I agree with this sentiment as well. A pure immutable system gives
> > more
> > optimization options to VM implementers, but as is without the
> > shared/hybrid
> > heaps Erlang doesn't do a good job of leveraging the potential of a
> > purely
> > immutable state system.
> >
> > On Wed, Nov 3, 2010 at 5:35 PM, Ryan Zezeski <rzezeski@REDACTED>
> > wrote:
> >
> > > On Mon, Oct 18, 2010 at 7:28 AM, Morten Krogh <mk@REDACTED>
> > wrote:
> > > >
> > > >
> > > > On practical terms, I feel that Erlang some times gives up
> > performance
> > > for
> > > > unnecessary reasons. I am especially thinking about mutability.
> > > > There is no reason to give up mutation within a process which is
> > already
> > > a
> > > > serializer.
> > > >
> > > > code like this
> > > >
> > > > loop(Dict)
> > > >    receive
> > > >    {put, K, V} ->
> > > >        Dict2 = dict:store(K, V, Dict),
> > > >        loop(Dict2)
> > > >    end.
> > > >
> > > >
> > > > is just a performance wise inefficient way of doing
> > > >
> > > > loop(Mut) ->
> > > >    receive
> > > >    {put, K, V} ->
> > > >        put(K,V, Mut),
> > > >        loop(Mut)
> > > >    end.
> > > >
> > > > where Mut is a mutable data structure.
> > > > In the dict exampe, there is a unncecessary work and garbage
> > collection
> > > for
> > > > the functional data structure Dict.
> > > > And you gain nothing, since it is sequential code.
> > > >
> > > >
> > > I think this is a good point.  If you are simply creating a data
> > structure
> > > in _one_ process it should be mutable.  Then when you publish it it
> > becomes
> > > immutable.  I have Clojure's transients in mind.
> > >
> > > http://clojure.org/transients
> > >
> > > *If a tree falls in the woods, does it make a sound?**If a pure
> > function
> > > mutates some local data in order to produce an immutable return
> > value, is
> > > that ok?*
> > > *
> > > *
> > > *
> > > *
> > >
> >
> >
> >
> > --
> > Tony Arcieri
> > Medioh! A Kudelski Brand
>

Robert, I'm not sure if this was directed towards me or the thread in
general, but I want to clarify my point.

I completely agree immutable data structures are the way to go.  One reading
of Java Concurrency In Practice was enough to make me shiver at shared
mutable structures.

I don't agree that operations on immutable data structures cost less than
*I* imagine.  If that was the case why would Rich Hickey go through great
lengths to find _performant_ immutable data structures that he implemented
in Clojure (Bagwell Trees)?  Furthermore, why would he also create
a separate set of functions (transients) for building up a data structure in
a mutable way and then publishing it to the world as an immutable data
structure?  Plus, the cost of anything is relative.  Yes, in most cases
immutable data structures are more than fast enough.  But in a few cases
maybe they aren't.

What is wrong about using a mutable data structure that can only ever be
touched by _one_ and only _one_ process?  Then, if this process wants to
share it it must be converted to an immutable data structure.  In Clojure
this works by using a set of dedicated functions to work with the mutable
data structure.  This data structure will not work with the rest of Clojure
which expects an immutable thing (or maybe it implicitly converts it, I
forget).  Then when you are ready to share this thing with other threads you
simply change it to be immutable which is a cheap operation.  So, in the
specific case where only _one_ thread creates the data structure but then is
shared by many you get the speed of in-place mutation with the safety of
immutability.  I'm no programming language expert, but this seems like an
obvious win.

To clarify even more, the data structure is not "both mutable and
immutable."  It's mutable by one thread/process and then immutable the
second it's published (shared) and can no longer be mutated.

-Ryan