[erlang-questions] Measuring Erlang Communication Overhead?

Sat Jun 25 02:50:35 CEST 2011

On Fri, Jun 24, 2011 at 12:47:41PM -0700, Anthony Molinaro wrote:
>   What we see is that this caps out around 100-150 requests per second
> after which erlang becomes unresponsive and we need to kill it.  We
> have an older server written in java which can do about an order of
> magnitude more request per second.  I've looked over the code and the
> efficiency guide, and we've also profiled using fprof and not found
> any low hanging fruit.  I'm thinking that maybe the scatter/gather
> approach might be bad if the context being passed to each process gets
> large and that instead it might be better to have a few large processes
> which can run through all rules with the context.  However, its a largish
> undertaking to refactor like this, so I was wondering if there is any
> way to measure the overhead of communication?

By "erlang becomes unresponsive and we need to kill it" do you mean
that the VM doesn't recover when you stop sending requests?  That's
not a performance problem :-(

>   I've looked at message queue sizes and they are mostly zero (sometimes
> a few go up to 1, but nothing huge), but was wondering if there are other
> things that can be looked at, or tools which could be used to determine
> whether the limiting constraint is communication overhead (ie, copying
> around 1K to 400 processes), or whether it is something else.

One thing that might be easy to do is make the communication bigger
and see if you reach the performance limit earlier.  Either make the
messages bigger (perhaps doubling them, but not by just sending
{Context, Context} because the implementation may preserve the
sharing) or send each message twice ( P ! {look_at Context}, P !
{really_do Context} with the look_at ones read and ignored.)

    Jeff Schultz