[erlang-questions] shared data areas (was Re: [erlang-questions] OOP in Erlang)

Tue Aug 17 15:27:20 CEST 2010

"... in my experience, from complex telephony and multimedia apps, it is in
fact pretty common to have thousands of small 'clusters' of state machines,
each cluster sharing a whopping big state record and taking turns processing
different parts of the control flow."

But what's behind the decision to structure things that way?

I'm not sure what you mean by "processing different parts of the control
flow"?  Do you mean you had a kind of pipeline process structure?

Taking Nicholas' view, but with due concern for performance: what about
finding some way to privilege processes that manage big data structures?  At
the cost of a certain amount of global flow analysis, you might be able to
automatically identify processes that were basically the system's Whopping
Big State Record Managers (i.e., infinitely tail-recursive, passing Big
State to itself each time, usually taking messages only in order to
immediately send back fragments of Big State Record).  Maybe you could then
generate code that would do something more like a synchronous context switch
(and back again) than a full message-send (with corresponding reply), when
communicating with these processes.  These record-access pseudo-messages
might be made particularly fast for operations that do little more than read
Big State data that never gets modified -- these probably being the
overwhelming majority of accesses anyway.

This could break hot-loading of modules, of course, depending on how you
structure things. The interprocess coupling would go from being almost as
loose as possible to almost as tight as a subroutine call.  I doubt this
idea would be practical if it required Whopping Big State Record Managers to
be entirely local to a module.  I don't see how you'd solve that problem in
general.

It would definitely be problematic across node boundaries, but perhaps this
is less of an issue than how to permit hot loading.  If your goal was to get
higher speed for accesses of the Whopping Big State Record for a bunch of
FSMs, you wouldn't be making those FSMs reach across node boundaries anyway.

(These thoughts are somewhat inspired by an admittedly dilettantish interest
in microkernel performance.  For similar reasons, a number of similar issues
about sharing, copying and performance arose in microkernel research.  These
issues seem to have been resolved pretty successfully in the L3/L4
microkernel family.)

-michael turner

On Sat, Aug 14, 2010 at 11:45 PM, Ulf Wiger
<ulf.wiger@REDACTED>wrote:

>
> Note: these are all hand-waving comments from me.
>
>
> Nicholas Frechette wrote:
>
>> Co-locating processes on a shared heap on 1 scheduler might be dangerous.
>> What if a process migrates to another core/scheduler? Should that be
>> allowed? Blocked?
>>
>
> An implementation issue, but yes, I imagine it could be forbidden,
> or rather, the group would be considered as a whole for migration.
>
>
>
>  What if one of those processes forks another one? Should it belong to the
>> flock and be bound to that scheduler?
>>
>
> This could be at the parent's discretion, using a new spawn_opt
> option, e.g. 'same_heap', dictating that the child should use the
> parent's heap.
>
>
>
>  Should the message it is receiving dictate the scheduler it is going to
>> process it on?
>>
>
> No, I think this would be complicated, and can't see the need
> for it.
>
>
>
>  I think it will be quite hard to do transparently (sharing msg data
>> between processes without copying) mainly because it might be hard to tell
>> data that is a msg from data that isn't.
>>
>
> But a lot of thinking already went into this.
> Binaries are often off-heap, although in that case, you only need
> to look at the type tag to know.
>
> In the case I suggested, the only check needed is for whether the
> receiver is sharing the sender's heap.
>
>
>
>  If a process sends a message then dies, you'll have to either: allocate
>> the msg from the shared heap, somehow knowing or copying, or copy it in case
>> the process heap gets garbage collected/freed (or prevent that from
>> happening, which could have a detrimental effect on the system if the
>> receiving process has a very large queue and takes a very long time before
>> consuming said message)
>>
>
> To my uneducated eye, this seems fairly similar to normal process
> death + normal GC. If a process (not the last on that heap) dies,
> the extra problem is that some data can be "freed" that is still
> referenced by other processes. Something similar should have been
> necessary for the 'shared heap' emulator to work in the first place.
>
>
>  This could be mitigated by adding a special syntax for message creation
>> where they would be created on a shared heap (or a special syntax for shared
>> heap allocation). Perhaps something like {Foo} = Whatever (or Foo }=
>> Whatever.). Where 'Foo' would be copied on the shared heap. Binaries are
>> already reference counted so it might not be too bad to implement.
>>
>
> IMHO - no language changes! An option to spawn_opt is the right place
> and scope. That's where you can peek under the covers and affect things
> like GC parameters, which are otherwise fully transparent.
>
>
>  IMO though, the whole sending a large datastructure as a message is
>> largely a non-issue. You can easily wrap it in a process and allocate it
>> from that processe's heap and just pass the PID around.
>>
>
> But in my experience, from complex telephony and multimedia apps,
> it is in fact pretty common to have thousands of small 'clusters'
> of state machines, each cluster sharing a whopping big state
> record and taking turns processing different parts of the control flow.
>
> The suggestion is that this would work as an optimization for the
> tuning stage of such an application, much like it is now possible
> to tune heap size, GC thresholds, etc.
>
>
> BR,
> Ulf W
> --
> Ulf Wiger
> CTO, Erlang Solutions Ltd, formerly Erlang Training & Consulting Ltd
> http://www.erlang-solutions.com
>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>