[erlang-questions] shared data areas (was Re: [erlang-questions] OOP in Erlang)

Sat Aug 14 16:45:57 CEST 2010

Note: these are all hand-waving comments from me.

Nicholas Frechette wrote:
> Co-locating processes on a shared heap on 1 scheduler might be 
> dangerous. What if a process migrates to another core/scheduler? Should 
> that be allowed? Blocked?

An implementation issue, but yes, I imagine it could be forbidden,
or rather, the group would be considered as a whole for migration.

> What if one of those 
> processes forks another one? Should it belong to the flock and be bound 
> to that scheduler?

This could be at the parent's discretion, using a new spawn_opt
option, e.g. 'same_heap', dictating that the child should use the
parent's heap.

> Should the message it is receiving dictate the scheduler it is going to 
> process it on?

No, I think this would be complicated, and can't see the need
for it.

> I think it will be quite hard to do transparently (sharing msg data 
> between processes without copying) mainly because it might be hard to 
> tell data that is a msg from data that isn't.

But a lot of thinking already went into this.
Binaries are often off-heap, although in that case, you only need
to look at the type tag to know.

In the case I suggested, the only check needed is for whether the
receiver is sharing the sender's heap.

> If a process sends a 
> message then dies, you'll have to either: allocate the msg from the 
> shared heap, somehow knowing or copying, or copy it in case the process 
> heap gets garbage collected/freed (or prevent that from happening, which 
> could have a detrimental effect on the system if the receiving process 
> has a very large queue and takes a very long time before consuming said 
> message)

To my uneducated eye, this seems fairly similar to normal process
death + normal GC. If a process (not the last on that heap) dies,
the extra problem is that some data can be "freed" that is still
referenced by other processes. Something similar should have been
necessary for the 'shared heap' emulator to work in the first place.

> This could be mitigated by adding a special syntax for message creation 
> where they would be created on a shared heap (or a special syntax for 
> shared heap allocation). Perhaps something like {Foo} = Whatever (or Foo 
> }= Whatever.). Where 'Foo' would be copied on the shared heap. Binaries 
> are already reference counted so it might not be too bad to implement.

IMHO - no language changes! An option to spawn_opt is the right place
and scope. That's where you can peek under the covers and affect things
like GC parameters, which are otherwise fully transparent.

> IMO though, the whole sending a large datastructure as a message is 
> largely a non-issue. You can easily wrap it in a process and allocate it 
> from that processe's heap and just pass the PID around.

But in my experience, from complex telephony and multimedia apps,
it is in fact pretty common to have thousands of small 'clusters'
of state machines, each cluster sharing a whopping big state
record and taking turns processing different parts of the control flow.

The suggestion is that this would work as an optimization for the
tuning stage of such an application, much like it is now possible
to tune heap size, GC thresholds, etc.

BR,
Ulf W
-- 
Ulf Wiger
CTO, Erlang Solutions Ltd, formerly Erlang Training & Consulting Ltd
http://www.erlang-solutions.com