[erlang-questions] Message passing via shared memory

Thu Feb 5 04:31:05 CET 2009

Somebody claiming to be Imre Palik wrote:
> 
> > From: Andras Georgy Bekes <bekesa@REDACTED>
> >
> >> I'm pretty interested in this method, because while memory copying is
> >> OK for MPP or clustered environments, its is much slower than memory
> >> sharing on SMP environments if you are deailing with certain amount
> >> of data.
> > Memory sharing is faster only if the number of processors (cores) is
> > small. As the number of cores increases, the environment is getting
> > similar to a cluster.
> 
> Could you explain this?  I always had the impression, that because
> there aren't any variables in erlang, cache issues won't hurt the speed
> (though it would make the GC more tricky).  What am I missing?  What
> would cause the slowdown?

Inter-core memory access would run up against latency and bandwidth
restrictions pretty quickly if you don't arrange your access patterns
carefully.  Accesses to shared memory have more costs than just cache
coherency on updates.  (Especially with a GC that needs to resolve
non-local references while collecting.)
If the individual cores are busy enough, cache pressure is also likely
to become a problem.  (If you make your caches big enough to store all
the data that the nearby cores will need, you've just reinvented
copying, and you still need to deal with coherency for any bookkeeping
the GC needs to do.)

One possible intermediate strategy is to take advantage of a shared
address space and do lazy copying:  Sending a message to a process just
gives it a reference, and if the object is too far away on access, a
local copy is made.
A sufficiently clever implementation of this would keep track of where
all the copies of a particular object are, so if a reference is sent to
a distant node that already has a local copy, the local copy can be
used instead of making another copy.[1]
(I have no idea whether this would actually provide enough benefit to
be worth implementing, but it'd be interesting to hear expert
opinions.  The big problem with doing things this way would be latency
of the copy-on-access; if there wasn't some way to hide the latency, it
would need a fairly good prefetching strategy.)

dave

[1] I inadvertently constructed an excellent test case for this kind of
    smart copying recently:
    --------
    loop(Last,Prev) ->
        receive
        stop ->
            ok;
        {next,Pid} ->
            Next = {Last,Prev},
            Pid ! Next,
            loop(Next,Last)
        end.
    --------
    This is a slightly simplified version of code I wrote for
    <http://projecteuler.net/index.php?section=problems&id=230>.
    The Erlang program (with a deep copy on the send) ground to a halt
    after consuming all the available memory before it finished the
    computation; re-implementing it in C (where I had control over
    reference vs.  copying) gave me a much smaller and faster program.
    (I hastily revised my mental model of how Erlang manages memory
    shortly thereafter.)

-- 
Dave Vandervies
dj3vande@REDACTED

Plan your future!  Make God laugh!