[erlang-questions] Re: Shared/Hybrid Heap

Bengt Kleberg bengt.kleberg@REDACTED
Tue Oct 19 15:46:34 CEST 2010


Greetings,

When you write that garbage collection needs all threads to stop, is
that in the context of the current Erlang garbage collector?

It seems to me
(http://www.cs.wustl.edu/~mdeters/doc/slides/rtgc-history.pdf) that
there are concurrent/parallel collectors at least for other languages.


bengt

On Sun, 2010-10-17 at 13:30 +0200, Morten Krogh wrote:
> Nicholas, nice post.
> 
> I agree that fully automatic garbage collection probably will need to 
> stop all threads at the same time.
> 
> Actually, I still think binary sharing is an example of it. It is 
> restricted relative to the general case, which makes it simpler.
> 
> Alternatively, one could accept that the shared data should survive 
> unless someone killed it.
> Or one could just accept that the programmer should accept that garbage 
> collection is done in one sweep.
> 
> 
> I think it is insctructive to be precise about what the problem could 
> be. Here is one example.
> A giant immutable key value store accesed by http read requests. To get 
> highest performance, you would
> put the data in one tree, or hash table, shared by every erlang process. 
> You would then spawn worker processes for each core,
> and let them handle http requests. To respond, they would access the 
> tree, grab the data, and reply.
> This is completely equivalent to one process owning the tree, and all 
> workers message passing into it, but performance is higher with the 
> "pointer access"
> than through the message passing and context switching. Actually, you 
> can make the API exactly like message passing. There would be no 
> automatic garbage collection of
> the tree, and it would have to be killed explicitly, exactly like a 
> process today.
> 
> Shared memory and message passing to a data owner process is really the 
> same thing semantically. The only difference is performance.
> Actually shared memory is a bad wording. It should be called "C like 
> pointer access to the data".
> 
> I guess ets is made based on this logic.
> 
> Anyway, I regret using the word thread earlier. One word, process, is 
> enough. The only real question is whether erlang should have a way for 
> one process to read state of another
> process without the full send, put in message queue, context switch, 
> recieve, lookup data, send, put in message queue, receive overhead.
> The semantics is the same. The programing model is the same.  The speed 
> is higher.
> 
> Cheers,
> 
> Morten.
> 
> 
> 
> 
> On 10/14/10 4:47 PM, Nicholas Frechette wrote:
> > Sharing binaries is fine because they use a different heap altogether and a
> > different garbage collection scheme (reference counting).
> > Sharing a nif ressource implies you manage it manually through nifs, how you
> > manage it is largely up to you.
> > As far as other languages go, yes, most (all?) of them stop all threads to
> > garbage collect (if those threads share memory). Java does this, C# does
> > this, ruby, python, etc. C# recently or will shortly introduce a garbage
> > collection algorithm that runs concurrently and thus doesn't stop all
> > threads but from what I can remember, it isn't 100% guaranteed (if threads
> > continue to allocate past a threshold, I believe they are still all stopped
> > waiting for GC to complete). This an important reason why multi generational
> > garbage collection algorithms are so popular: it keeps most GC cycles quick,
> > thus stopping all threads for the least amount of time.
> >
> > The idea behind each process having its own heap is twofold: it helps memory
> > locality and it isolates that heap such that it can be GCed on its own,
> > without interrupting any other process. Processes are such essentially
> > sandboxed in that regard.
> > This also means that any two processes, or more, can run garbage collection
> > concurrently without issues.
> >
> > While it sounds bad to stop all processes/threads sharing a piece of memory,
> > in reality, for most desktop applications, it is a non issue. The story is
> > different for server software however, where they tend to allocate a lot and
> > have many threads.
> >
> > When you want to share some data between P1 and P2, if they do not share
> > everything, a problem arises quite quickly: how do you mark allocations that
> > should be shared and those that shouldn't? If you share everything,
> > allocation is easy and goes in 1 heap (ignoring binaries here in their own
> > heap). If you do not, you have to select those allocations. In both cases,
> > because some memory is shared, the simplest way to GC will be to stop both
> > processes. After all, you will need to inspect their respective stacks to
> > see if they reference things in that shared heap and if the processes are
> > running, the stack will keep changing. If even for the time of a memcpy of
> > the stack, all processes will have to stop at some point or another. You
> > also won't be able to run the GC concurrently on P1 and P2 if they share
> > memory, or at least, it will be quite hard.
> >
> > Sharing memory and garbage collection is a non trivial problem to solve and
> > as erlang currently does, the simplest, most elegant way to deal with the
> > problem is to not share at all (and always copy) to avoid the problem.
> >
> > Nicholas
> >
> > On Thu, Oct 14, 2010 at 4:22 AM, Morten Krogh<mk@REDACTED>  wrote:
> >
> >>
> >>> You are missing the point -- or at least the point which I think Richard
> >>> is making. In the scheme you propose, T2's execution will be influenced by a
> >>> piece of data that _in principle_ is not shared. For example, T2 needs to be
> >>> stopped executing or synchronized with T1 by garbage collection which might
> >>> take place when process T1 allocates some big D2.
> >>>
> >>> How is that different than the two processes sharing everything?
> >>>
> >>> Kostis
> >>>
> >> Kostis,
> >>
> >> I don't understand you. Are you saying that it is impossible to implement
> >> multithreading unless the threads share everything, and garbage collection
> >> will be slow?
> >>
> >> What about two erlang processes that share a binary. Isn't that how it
> >> works today?
> >> What about two erlang processes that share a nif resource?
> >> What about other languages and virtual machines. They have multithreading.
> >>
> >> We must be talking past each other here.
> >>
> >> By the way, I am not advocating such a change to erlang. I think you can
> >> get around shared memory by choosing the right processes, partition data
> >> correctly etc.
> >> It is only the most computationally intensive cases where you can't, and
> >> there one can use a nif resource or some external way of sharing data.
> >>
> >>
> >>
> >> ________________________________________________________________
> >> erlang-questions (at) erlang.org mailing list.
> >> See http://www.erlang.org/faq.html
> >> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
> >>
> >>
> 
> 
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
> 



More information about the erlang-questions mailing list