[erlang-questions] Concurrent processes on multi-core platforms with lots of chatter

Tue Dec 1 21:17:37 CET 2009

Hi,

The application in question needs to talk to low-level C/C++ drivers. The idea is to create a "pool" or Erlang processes that act as an interface to a linked in driver (or maybe even have NIF functions). We expect to have somewhere in the order of 10,000 Erlang processes requiring access to these drivers at any time, and for obvious reasons a pool of processes providing serialized access to the drivers makes a little more sense than every Erlang process accessing that driver directly.

Now I don't imagine that the Erlang IPC is going to be the "lowest hanging fruit" in regards to performance here - far from it. But there is an internal debate as to how much code should be C/C++ vs Erlang. Anything that can show the Erlang side is efficient (I know it is - for this highly concurrent application it has proven far superior to any C++ code that has been written) will help.

It's not critical that this is addressed immediately. But certainly if it was a feature on the radar of the OTP team I think it would go a long way to calming some nerves over here.

Thanks

Matt

-----Original Message-----
From: Zoltan Lajos Kis [mailto:kiszl@REDACTED] 
Sent: Tuesday, December 01, 2009 12:14 PM
To: Evans, Matthew
Cc: erlang-questions@REDACTED
Subject: Re: [erlang-questions] Concurrent processes on multi-core platforms with lots of chatter

Moving back to the original problem... Your processes need to do a lot 
of chatter, which means the tasks run within these processes need to 
have a lot of shared data. If that's the case, why don't you "migrate" 
the tasks onto one process?

When a process of yours would create such an affinity group, it could 
very well say to the other process: "Hey, you are going to have too many 
requests for me. Instead of messaging me, run this and that task". 
Instead of leaving the group, it could say "Okay, just ask me if you 
need anything from now on.". This is doable in Erlang without any 
changes to the VM, and my guess is it has the same effect on performance 
as the affinity groups would have.

Regards,
Zoltan.

Evans, Matthew wrote:
> I'll second that. Although I generally agree with the direction Erlang has taken of not exposing the underlying architecture to the developer, we must realize that there is a segment of people who do care. Providing abstractions to map processes onto a specific core would provide benefits to those people. We have already gone in that direction to a small degree with the cpu_topology options (both with the -sct VM invocation arguments, and with the erlang:system_flag/2 function).
>
> ________________________________
> From: Alex Arnon [mailto:alex.arnon@REDACTED]
> Sent: Tuesday, December 01, 2009 4:23 AM
> To: Jayson Vantuyl
> Cc: Robert Virding; Evans, Matthew; erlang-questions@REDACTED
> Subject: Re: [erlang-questions] Concurrent processes on multi-core platforms with lots of chatter
>
> +1
> And then some :)
>
> On Tue, Dec 1, 2009 at 5:54 AM, Jayson Vantuyl <kagato@REDACTED<mailto:kagato@REDACTED>> wrote:
> Off the top of my head, I would expect this to be a process_flag.
>
> Something like:  process_flag(scheduler_affinity, term()).  Possibly with a generic group specified by an atom like undefined.  This feels more functional than the proposed paf module, and has the benefit of being data-centric.
>
> The reason I would use a term (and then group by the hash of the term) is because it gives an elegant way to group processes by an arbitrary (possibly application specific) key.  Imagine if, for example, Mnesia grouped processes by a transaction ID, or if CouchDB grouped them by socket connection, etc.  By not specifying it as an atom or an integer, it lets you just use whatever is appropriate for the application.
>
> I'm not too keen on reusing process groups primarily because group leaders are used for some really common stuff like IO, which shouldn't affect affinity at all.
>
> If we want to be really crazy, we could provide the ability to specify something like a MatchSpec to map a process group to a processor.  Call it a SchedSpec.  This has the added bonus that you could have multiple handlers that would match in order without having the full blown load of a gen_event or arbitrary fun.  This might also provide the beginnings of more powerful prioritization than the existing process_flag(priority) we have now.
>
> Currently, the Use Case that people seem to be concerned with is ensuring locality of execution.  However, some people might also want to use it to provide dedicated cores to things like system processing.  I have no idea how this would fit with things like the AIO threads, but I'm pretty sure that HPC could benefit from, for example, dedicating 1 scheduler to system management tasks, 1 core to IO, and 6 cores to computation.  This is a higher bar, but it's important nonetheless.
>
> Of course, this would have the user thinking about the underlying CPU topology (which I agree is bad).  However, this is simply unavoidable in HPC, so it's best that we accept it.  Let me state this emphatically, if we try to make Erlang "smart" about scheduling, what is going to happen is that HPC people will dig down, figure out what its doing wrong, then come back with complaints.  We will never be able to make it work right for everyone without exposing these same tunables (but likely with a crappier interface).  It's better to give them powerful hooks to customize the scheduler with smart default behavior for everyone else.
>
> The reason I like the process_flag(scheduler_affinity) / SchedSpec option is that it can easily start out with just the process_flag, and add something like SchedSpec's later, without having to change the API (or particularly the default behavior).  Basically, you get three groups of users:
>
> * Normal People: They don't use affinity, although pieces of the system might. (effectively implemented already)
> * Locality Users: They use affinity for locality using the convenient process_flag interface. (easily done with additional process_flag)
> * HPC: They use affinity, and plugin SchedSpecs that are custom to their deployment. (can be provided when demanded without breaking first two groups)
>
> On Nov 30, 2009, at 6:49 PM, Robert Virding wrote:
>
>   
>> Another solution would be to use the existing process groups as these are
>> not really used very much today. A process group is defined as all the
>> processes which have the same group leader. It is possible to change group
>> leader. Maybe the VM could try to migrate processes to the same core as
>> their group leader.
>>
>> One problem today is that afaik the VM does not keep track of groups as
>> such, it would have to do this to be able to load balance efficiently.
>>
>> Robert
>>
>> 2009/11/30 Evans, Matthew <mevans@REDACTED<mailto:mevans@REDACTED>>
>>
>>     
>>> Hi,
>>>
>>> I've been running messaging tests on R13B02, using both 8 core Intel and 8
>>> core CAVIUM processors. The tests involve two or more processes that do
>>> nothing more than sit in a loop exchanging messages as fast as they can.
>>> These tests are, of course, not realistic (as in real applications do more
>>> than sit in a tight loop sending messages), so my findings will likely not
>>> apply to a real deployment.
>>>
>>> First the good news: When running tests that do more than just message
>>> passing the SMP features of R13B02 are leaps and bounds over R12B05 that I
>>> was running previously. What I have however noticed is that in a pure
>>> messaging test (lots of messages, in a tight loop) we appear to run into
>>> caching issues where messages are sent between processes that happen to be
>>> scheduled on different cores. This got me into thinking about a future
>>> enhancement to the Erlang VM: Process affinity.
>>>
>>> In this mode two or more processes that have a lot of IPC chatter would be
>>> associated into a group and executed on the same core. If the scheduler
>>> needed to move one process to another core - they would all be relocated.
>>>
>>> Although this grouping of processes could be done automatically by the VM I
>>> believe the decision making overhead would be too great, and it would likely
>>> make some poor choices as to what processes should be grouped together.
>>> Rather I would leave it to the developer to make these decisions, perhaps
>>> with a library similar to pg2.
>>>
>>> For example, library process affinity (paf) could have the functions:
>>>
>>> paf:create(Name,[Opts]) -> ok, {error, Reason}
>>> paf:join(Name,Pid,[Opts]) -> ok, {error, Reason}
>>> paf:leave(Name,Pid) -> ok
>>> paf:members(Name) -> MemberList
>>>
>>> An affinity group would be created with options for specifying the maximum
>>> size of the group (to ensure we don't have all processes on one core), a
>>> default membership time within a group (to ensure we don't unnecessarily
>>> keep a process in the group when there is no longer a need) and maybe an
>>> option to allow the group to be split over different cores if the group size
>>> reaches a certain threshold.
>>>
>>> A process would join the group with paf:join/3, and would be a member for
>>> the default duration (with options here to override the settings specified
>>> in paf:create). If the group is full the request is rejected (or maybe
>>> queued). After a period of time the process is removed from the group and a
>>> message {paf_leave, Pid} is sent to the process that issued the paf:join
>>> command. If needed the process could be re-joined at that time with another
>>> paf:join call.
>>>
>>> Any takers? R14B01 perhaps ;-)
>>>
>>> Thanks
>>>
>>> Matt
>>>
>>>       
>
>
> --
> Jayson Vantuyl
> kagato@REDACTED<mailto:kagato@REDACTED>
>
>
>
>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org<http://erlang.org>
>
>
>