[erlang-questions] Measuring Erlang Communication Overhead?

Fri Jun 24 21:47:41 CEST 2011

Hi,

  We put together a system which consists of a few thousand processes
each of which holds a rule.  A request comes in with a context, the
context is sent to a subset of the processes (although this could be
close to a thousand of them), which combines the context and the rule
and sends back a result (both sends are done in a non-blocking fashion
and a gen_fsm recieves all the results and collates then).

  What we see is that this caps out around 100-150 requests per second
after which erlang becomes unresponsive and we need to kill it.  We
have an older server written in java which can do about an order of
magnitude more request per second.  I've looked over the code and the
efficiency guide, and we've also profiled using fprof and not found
any low hanging fruit.  I'm thinking that maybe the scatter/gather
approach might be bad if the context being passed to each process gets
large and that instead it might be better to have a few large processes
which can run through all rules with the context.  However, its a largish
undertaking to refactor like this, so I was wondering if there is any
way to measure the overhead of communication?

  I've looked at message queue sizes and they are mostly zero (sometimes
a few go up to 1, but nothing huge), but was wondering if there are other
things that can be looked at, or tools which could be used to determine
whether the limiting constraint is communication overhead (ie, copying
around 1K to 400 processes), or whether it is something else.

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@REDACTED>