[erlang-questions] Measuring Erlang Communication Overhead?

Fri Jun 24 22:33:33 CEST 2011

On Fri, Jun 24, 2011 at 02:03:05PM -0600, Jack Moffitt wrote:
> >  We put together a system which consists of a few thousand processes
> > each of which holds a rule.  A request comes in with a context, the
> > context is sent to a subset of the processes (although this could be
> > close to a thousand of them), which combines the context and the rule
> > and sends back a result (both sends are done in a non-blocking fashion
> > and a gen_fsm recieves all the results and collates then).
> 
> Since your unit of parallelization is a request, why not have a single
> process per request, which runs through determining the rules and
> executing them and returning the result?

That's the gist of my question if you read further on.  However, before
I invest the time to do that I was wondering if there was a way to
determine if it would help.  So I have questions like is there a way
to measure the cost of communication between processes on a running
system, or a way to benchmark this with an existing system.  I came
across percept and am wondering if it helps on those lines, but I'd
also be interested to know about ways to determine this sort of
information with things like system_info or process_info.  Mostly
looking for strategies tools to determine if inter process communication
is the issue

> Does each rule have a lot of state or need information from more than
> one request?

Each rule has state and the state can be updated at any point by another
process which gets updates from another management system.  Now I've
not measured memory used just by rules but as they make up the majority
of my vms processes and memory() shows about 700M used by processes, if
I consolidated all rules into one process I'd end up only being able to
have a few of them running on a 24G machine.  Now that may be okay as I
only have 16 cores, but is a bit of an undertaking (as I have to figure
out ways to update all the rules when the updates from the management
system happen.

> Your request is blocked until the results are all in anyway, so you
> might as well run all the results in the request's process to begin
> with. If each rule has lots of its own state, you could use ETS to
> manage that instead of processes.  If rules depend on multiple
> requests, then I'm not sure what to suggest.

ETS really doesn't help if the problem is copies, you have to copy stuff
out of ETS when you look it up, so I'm not sure how that would help, but
maybe I'm somehow mistaken there?

I'm also wondering if it might not be better to have a module and take
each rule and compile it as closure under that module or something like
that, then I could assumable call rules:rule12345(Context) in a process
and it would be faster than sending to a process and the context wouldn't
need to be copied, but I'm not sure how feasible that is, nor if there's
a limit to the number of functions in a module I'd run into.

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@REDACTED>