[erlang-questions] Concurrent processes on multi-core platforms with lots of chatter

Mon Nov 30 19:35:45 CET 2009

Evans, Matthew wrote:
> Hi,
>
> I've been running messaging tests on R13B02, using both 8 core Intel and 8 core CAVIUM processors. The tests involve two or more processes that do nothing more than sit in a loop exchanging messages as fast as they can. These tests are, of course, not realistic (as in real applications do more than sit in a tight loop sending messages), so my findings will likely not apply to a real deployment.
>
> First the good news: When running tests that do more than just message passing the SMP features of R13B02 are leaps and bounds over R12B05 that I was running previously. What I have however noticed is that in a pure messaging test (lots of messages, in a tight loop) we appear to run into caching issues where messages are sent between processes that happen to be scheduled on different cores. This got me into thinking about a future enhancement to the Erlang VM: Process affinity.
>
> In this mode two or more processes that have a lot of IPC chatter would be associated into a group and executed on the same core. If the scheduler needed to move one process to another core - they would all be relocated.
>
> Although this grouping of processes could be done automatically by the VM I believe the decision making overhead would be too great, and it would likely make some poor choices as to what processes should be grouped together. Rather I would leave it to the developer to make these decisions, perhaps with a library similar to pg2.
>
> For example, library process affinity (paf) could have the functions:
>
> paf:create(Name,[Opts]) -> ok, {error, Reason}
> paf:join(Name,Pid,[Opts]) -> ok, {error, Reason}
> paf:leave(Name,Pid) -> ok
> paf:members(Name) -> MemberList
>
> An affinity group would be created with options for specifying the maximum size of the group (to ensure we don't have all processes on one core), a default membership time within a group (to ensure we don't unnecessarily keep a process in the group when there is no longer a need) and maybe an option to allow the group to be split over different cores if the group size reaches a certain threshold.
>
> A process would join the group with paf:join/3, and would be a member for the default duration (with options here to override the settings specified in paf:create). If the group is full the request is rejected (or maybe queued). After a period of time the process is removed from the group and a message {paf_leave, Pid} is sent to the process that issued the paf:join command. If needed the process could be re-joined at that time with another paf:join call.
>
> Any takers? R14B01 perhaps ;-)
>
> Thanks
>
> Matt
>
>   
Hi Matt,

I think the "optimal" number of affinity groups and process bindings to 
these groups would pretty much depend on the number of schedulers, and 
in turn on the number of cores. Probably also depends on the CPU 
architecture. Doesn't this road lead to giving up the high-levelness of 
Erlang, where we produce hardware-dependent code? I don't want to end up 
writing inline BEAM code in the future... :)
Also, having such a rigid grouping is not a good idea as it completely 
ignores the workload of the member processes. You might force a bunch of 
heavy-weight processes onto the same core, losing more than you gain.

Anyway, in my opinion it would be a better approach if there was no 
grouping at all: we would only define the affinity of a process to 
another. This would result in a directed, fully-connected graph on all 
processes. Default edge label could be 1, so you only need to increase 
it when needed, something like process_flag(affinity, {Pid, 42}).
You could also define some sort of workload-coefficient, put it on the 
vertices, and use a force-based alignment algorithm to assign processes 
to schedulers :)

Regards,
Zoltan.