ejabberd Bottleneck: Understanding Fprof Analysis

Fri Jun 18 23:48:32 CEST 2010

Hi,

I am load testing ejabberd on Erlang 13B03 on a 4-core Ubuntu Server 10.04
LTS VM w/ 16GB of RAM.

As the test is running, I notice that the message queue for ejabberd's
mod_muc process grows and grows.  From a crash dump I observed the message
queue for mod_muc is bottlenecking the whole system, it's the only process
with a sizable message queue (it's enormous, well over 100,000 queued
messages).  Using process_info in a remote shell it is possible to see the
queue length growing very rapidly in real time (~1000msg/s).

I used fprof to do some profiling on the mod_muc process, see the output
file here: http://pastebin.com/QXchAKtM.  From the log mod_muc:do_route was
called around 26,000 times over the course of the 38s of data collection.
This is  ~1.4ms per packet, which translates to millions of clock cycles,
highly inefficient! It seems that nearly all the time is being used in
"suspend" calls (if you add up the ACC time for suspend, it becomes 100% of
the overall measured time).  Indeed, when I call process_info on the mod_muc
process in a remote shell in the live server, its status seems to always be
"runnable".  However, my fprof can only use wallclock time, not the high
resolution CPU time, so I'm not sure how accurate all of this is.

It's odd that a 4-core box can't even handle 2,500 messages/s of incoming
traffic.  It seems like the bottleneck is a single process, which if I
understand correctly can only run on 1 core.  

Can anyone help me decipher the fprof output to shed light on what is going
on? 

Thanks,

Karthik

Karthik Kailash | Product

SocialVision, Online Television Becomes a Social Experience

CELL . 408.768.7704  | WEB . www.socialvisioninc.com | FACEBOOK .
facebook.com/socialvision <http://www.facebook.com/socialvision>  | TWITTER
. twitter.com/socialvision