[erlang-questions] Re: Shared/Hybrid Heap
Sun Oct 17 13:30:45 CEST 2010
Nicholas, nice post.
I agree that fully automatic garbage collection probably will need to
stop all threads at the same time.
Actually, I still think binary sharing is an example of it. It is
restricted relative to the general case, which makes it simpler.
Alternatively, one could accept that the shared data should survive
unless someone killed it.
Or one could just accept that the programmer should accept that garbage
collection is done in one sweep.
I think it is insctructive to be precise about what the problem could
be. Here is one example.
A giant immutable key value store accesed by http read requests. To get
highest performance, you would
put the data in one tree, or hash table, shared by every erlang process.
You would then spawn worker processes for each core,
and let them handle http requests. To respond, they would access the
tree, grab the data, and reply.
This is completely equivalent to one process owning the tree, and all
workers message passing into it, but performance is higher with the
than through the message passing and context switching. Actually, you
can make the API exactly like message passing. There would be no
automatic garbage collection of
the tree, and it would have to be killed explicitly, exactly like a
Shared memory and message passing to a data owner process is really the
same thing semantically. The only difference is performance.
Actually shared memory is a bad wording. It should be called "C like
pointer access to the data".
I guess ets is made based on this logic.
Anyway, I regret using the word thread earlier. One word, process, is
enough. The only real question is whether erlang should have a way for
one process to read state of another
process without the full send, put in message queue, context switch,
recieve, lookup data, send, put in message queue, receive overhead.
The semantics is the same. The programing model is the same. The speed
On 10/14/10 4:47 PM, Nicholas Frechette wrote:
> Sharing binaries is fine because they use a different heap altogether and a
> different garbage collection scheme (reference counting).
> Sharing a nif ressource implies you manage it manually through nifs, how you
> manage it is largely up to you.
> As far as other languages go, yes, most (all?) of them stop all threads to
> garbage collect (if those threads share memory). Java does this, C# does
> this, ruby, python, etc. C# recently or will shortly introduce a garbage
> collection algorithm that runs concurrently and thus doesn't stop all
> threads but from what I can remember, it isn't 100% guaranteed (if threads
> continue to allocate past a threshold, I believe they are still all stopped
> waiting for GC to complete). This an important reason why multi generational
> garbage collection algorithms are so popular: it keeps most GC cycles quick,
> thus stopping all threads for the least amount of time.
> The idea behind each process having its own heap is twofold: it helps memory
> locality and it isolates that heap such that it can be GCed on its own,
> without interrupting any other process. Processes are such essentially
> sandboxed in that regard.
> This also means that any two processes, or more, can run garbage collection
> concurrently without issues.
> While it sounds bad to stop all processes/threads sharing a piece of memory,
> in reality, for most desktop applications, it is a non issue. The story is
> different for server software however, where they tend to allocate a lot and
> have many threads.
> When you want to share some data between P1 and P2, if they do not share
> everything, a problem arises quite quickly: how do you mark allocations that
> should be shared and those that shouldn't? If you share everything,
> allocation is easy and goes in 1 heap (ignoring binaries here in their own
> heap). If you do not, you have to select those allocations. In both cases,
> because some memory is shared, the simplest way to GC will be to stop both
> processes. After all, you will need to inspect their respective stacks to
> see if they reference things in that shared heap and if the processes are
> running, the stack will keep changing. If even for the time of a memcpy of
> the stack, all processes will have to stop at some point or another. You
> also won't be able to run the GC concurrently on P1 and P2 if they share
> memory, or at least, it will be quite hard.
> Sharing memory and garbage collection is a non trivial problem to solve and
> as erlang currently does, the simplest, most elegant way to deal with the
> problem is to not share at all (and always copy) to avoid the problem.
> On Thu, Oct 14, 2010 at 4:22 AM, Morten Krogh<> wrote:
>>> You are missing the point -- or at least the point which I think Richard
>>> is making. In the scheme you propose, T2's execution will be influenced by a
>>> piece of data that _in principle_ is not shared. For example, T2 needs to be
>>> stopped executing or synchronized with T1 by garbage collection which might
>>> take place when process T1 allocates some big D2.
>>> How is that different than the two processes sharing everything?
>> I don't understand you. Are you saying that it is impossible to implement
>> multithreading unless the threads share everything, and garbage collection
>> will be slow?
>> What about two erlang processes that share a binary. Isn't that how it
>> works today?
>> What about two erlang processes that share a nif resource?
>> What about other languages and virtual machines. They have multithreading.
>> We must be talking past each other here.
>> By the way, I am not advocating such a change to erlang. I think you can
>> get around shared memory by choosing the right processes, partition data
>> correctly etc.
>> It is only the most computationally intensive cases where you can't, and
>> there one can use a nif resource or some external way of sharing data.
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:
More information about the erlang-questions