[erlang-questions] Message passing via shared memory

Thu Feb 5 12:05:59 CET 2009

> > Memory sharing is faster only if the number of processors (cores)
> > is small. As the number of cores increases, the environment is
> > getting similar to a cluster.
> Could you explain this?  I always had the impression, that because
> there aren't any variables in erlang, cache issues won't hurt the
> speed (though it would make the GC more tricky).  What am I missing? 
> What would cause the slowdown?
Well, I am not an expert in this topic, so my thoughts might be wrong, 
but my impression is that inter-core communication should not be done 
through the main memory (on a many-core processor).

Consider two (Erlang) processes running on two different cores on a 
manycore, sending messages to each other. Assuming that they don't 
share the L2 (or L3 or L4 or... cache), their communication goes 
through main memory i.e. when they send a message with memory-sharing, 
the memory area containing the message will eventually be pushed to 
main memory.

I think manycore processors will have some kind of inter-core 
communication network (the TILE64 has), which should be utilized when 
sending messages between processes running on different cores. This is 
what I meant by "similar to a cluster".

Now if you use the shared-memory message passing approach, the processor 
might use the inter-core network to copy the message directly from the 
cache of one core to another (this is what the TILE64 does I think with 
its "virtual L3 cache which is an aggregate of all the L2 caches"), but 
your message is copied eventually. So if it is copied anyways, why 
implement message sharing that makes GCing difficult and, as the number 
of cores increases, slower and slower.

Yes, I ended up in the only problem of a less efficient GC. Note that 
what the current Erlang VM does is not what I've descibed above. It's 
what it should do on a manycore. Hope the Erlang VM guys will share 
their thoughts on this topic.

To answer someone else's question, this is only true for small messages. 
For really large messages, it might be better to share them, and copy 
only the actually used parts on-demand to the other core's cache, but 
if the whole message is used by the recipient, you end up with copying 
the message anyways and the only advantage is that you saved some space 
in the main memory (in the nearest common cache).

	Georgy