[erlang-questions] gen_server bottleneck
Fri Dec 14 15:47:00 CET 2012
On Fri, Dec 14, 2012 at 7:39 AM, Saravanan Vijayakumaran
> Hi all,
> I have been trying to write a network simulator in Erlang modelled
> on ns-3. The code can be found at https://github.com/avras/nsime
> NSIME is a discrete event simulator where a registered gen_server process
> nsime_simulator holds all the events to be simulated in a gb_trees sorted by
> event time. This process has become a bottlneck when the simulation is large
> as multiple entities try to schedule events by doing a call on the single
> nsime_simulator process. I am still a newbie in Erlang profiling so this is
> more of a hunch arising from running the example scenarios.
Single registered processes are a potential bottleneck, so your hunch
is right. But measurement will both confirm this and help you
understand Erlang application architecture.
Erlang provides a dazzling array of diagnostic tools to help track
down process related problems. But I'd start with etop:
The registered process in question should show up at the top and let
you know what's spiking, growing, etc. You can run your various stress
tests and watch etop to get some idea what's going on with that
process (and others).
> How can I remove this bottleneck?
Once you understand the bottleneck, you can start to think about this.
If you're bottlenecking on CPU (all your cores are fully utilized at
peak load) then you need either a faster machine or you'll need to
distribute your application to multiple machines.
I don't know enough about Erlang's SMP support these days to give you
any advice about core utilization. There are plenty of experts here.
Distributing the processing to multiple processes (e.g. one per core)
*might* give you a speedup but I don't know.
If the bottleneck is that your single process is accumulating messages
over time, you'll need to look at either slowing down senders (e.g.
rate limiting) or speeding up message handling (e.g. spill to disk,
drop messages, speed up processing). Or maybe this isn't a problem --
the queue may grow and then come back to zero eventually.
Judicious use of ETS might give you some options for addressing problems.
I'd start with measurement and go from there.
More information about the erlang-questions