shared data areas (was Re: [erlang-questions] OOP in Erlang)

Thu Aug 12 10:01:28 CEST 2010

Fred Hebert wrote:
> 
> On Wed, Aug 11, 2010 at 8:44 AM, Jesper Louis Andersen 
> <jesper.louis.andersen@REDACTED 
> <mailto:jesper.louis.andersen@REDACTED>> wrote:
> 
> 
>     There is one idea here I have been toying with. One problem of Erlangs
>     memory model is that sending a large datastructure as a capability to
>     another process, several megabytes in size, will mean a copy. In the
>     default VM setup that is. But if you had a region into which it got
>     allocated, then that region could safely be sent under a proof that
>     the original process will not touch it anymore. [...]
> 
> One interesting point of *always* copying data structures is that you 
> need to plan for small messages (as far as possible) whether you are on 
> a single node or in a distributed setting. Moving up from a [partially] 
> shared memory model to a fully isolated one when going distributed is 
> likely going to have its share of performance problems and might create 
> a dissonance between "what is acceptable locally" and "what is 
> acceptable when distributed".

So a number of different variations on this theme have been tried
in the past and discussed as future extensions:

- Robert Virding used to have his own implementation called VEE.
   It had a global shared heap and incremental GC.

- A 'shared heap' option in BEAM once had experimental status.
   It passed messages by reference. Note that in neither of these
   cases is there any change in semantics - conceptually, msg
   passing was still copying. The main problem with this version
   was that it still used the old copying garbage collector. The
   idea was to implement a reference-counting GC, but for various
   reasons, it didn't happen. When multicore started becoming
   interesting, the shared-heap version was left behind.

- Hybrid heap was an evolution of 'shared heap', where only
   data sent in messages were put on a global heap. In the first
   implementation, data was copied to the global heap on send
   (unless already there) instead of being copied to the receiver's
   heap. This implementation was also broken by SMP.

- Lately, some exploration has gone into allowing a set of
   processes to share the same heap. This could be done in (at
   least) two ways:
   a) either co-locate all processes in the group on the same
   scheduler. This would ensure mutual exclusion and mainly
   serve to reduce message-passing cost in a process group.
   b) allow processes in a group to run on different schedulers,
   using mutexes to protect accesses to the heap data. This
   could allow for parallel processing, but the locking
   granularity would either be heap-level or ...very subtle,
   I guess. I favour option (a).

I think it is appropriate to use under-the-cover tricks to
speed up message passing as much as possible, as long as the
semantics stay the same. In other words, in all the above cases,
isolation has been a given, and conceptually, messages are
still copied.

BR,
Ulf W
-- 
Ulf Wiger
CTO, Erlang Solutions Ltd, formerly Erlang Training & Consulting Ltd
http://www.erlang-solutions.com