shared data areas (was Re: [erlang-questions] OOP in Erlang)
Ulf Wiger
ulf.wiger@REDACTED
Thu Aug 12 10:01:28 CEST 2010
Fred Hebert wrote:
>
> On Wed, Aug 11, 2010 at 8:44 AM, Jesper Louis Andersen
> <jesper.louis.andersen@REDACTED
> <mailto:jesper.louis.andersen@REDACTED>> wrote:
>
>
> There is one idea here I have been toying with. One problem of Erlangs
> memory model is that sending a large datastructure as a capability to
> another process, several megabytes in size, will mean a copy. In the
> default VM setup that is. But if you had a region into which it got
> allocated, then that region could safely be sent under a proof that
> the original process will not touch it anymore. [...]
>
> One interesting point of *always* copying data structures is that you
> need to plan for small messages (as far as possible) whether you are on
> a single node or in a distributed setting. Moving up from a [partially]
> shared memory model to a fully isolated one when going distributed is
> likely going to have its share of performance problems and might create
> a dissonance between "what is acceptable locally" and "what is
> acceptable when distributed".
So a number of different variations on this theme have been tried
in the past and discussed as future extensions:
- Robert Virding used to have his own implementation called VEE.
It had a global shared heap and incremental GC.
- A 'shared heap' option in BEAM once had experimental status.
It passed messages by reference. Note that in neither of these
cases is there any change in semantics - conceptually, msg
passing was still copying. The main problem with this version
was that it still used the old copying garbage collector. The
idea was to implement a reference-counting GC, but for various
reasons, it didn't happen. When multicore started becoming
interesting, the shared-heap version was left behind.
- Hybrid heap was an evolution of 'shared heap', where only
data sent in messages were put on a global heap. In the first
implementation, data was copied to the global heap on send
(unless already there) instead of being copied to the receiver's
heap. This implementation was also broken by SMP.
- Lately, some exploration has gone into allowing a set of
processes to share the same heap. This could be done in (at
least) two ways:
a) either co-locate all processes in the group on the same
scheduler. This would ensure mutual exclusion and mainly
serve to reduce message-passing cost in a process group.
b) allow processes in a group to run on different schedulers,
using mutexes to protect accesses to the heap data. This
could allow for parallel processing, but the locking
granularity would either be heap-level or ...very subtle,
I guess. I favour option (a).
I think it is appropriate to use under-the-cover tricks to
speed up message passing as much as possible, as long as the
semantics stay the same. In other words, in all the above cases,
isolation has been a given, and conceptually, messages are
still copied.
BR,
Ulf W
--
Ulf Wiger
CTO, Erlang Solutions Ltd, formerly Erlang Training & Consulting Ltd
http://www.erlang-solutions.com
More information about the erlang-questions
mailing list