[erlang-questions] gen_server bottleneck

Garrett Smith <>
Thu Dec 20 16:42:12 CET 2012


Hi Saravanan,

While you're running these scenarios, keep an eye on your Erlang
process as well -- e.g. with top, pidstat, etc. CPU and memory the two
usual suspects to watch.

High reductions is just a relative indicator until you start to see
your CPU bottlenecking. I.e. "high" might not matter in the least if
the applicable processes are keeping up with demand.

If you're not seeing an actual bottleneck at typical load, you might
drop this as a problem (I probably would :) -- or work to increase
load until you start to see something fall over, at which point you
now have the tools to look more closely.

Garrett

On Thu, Dec 20, 2012 at 4:34 AM, Saravanan Vijayakumaran
<> wrote:
> I tried running etop to profile the simulation. The commands I used and a
> snapshot of the etop data are below. The nsime_simulator process is
> undergoing a lot of reductions. But the message queue is zero. I was
> expecting to see a lot of messages there because 10,000 event schedule calls
> each result in a gen_server:call to nsime_simulator. The nsime_simulator
> contains the event list gb_trees and hence it occupies a lot of memory. So
> that is not surprising. The plists module is used to run the simultaneous
> events in parallel. It is occupying the maximum memory. I don't understand
> how it can occupy more memory than the nsime_simulator itself. I will have
> to look at the code again to figure this out. I will keep you posted. Thanks
> for all your suggestions.
>
> Simulation command
> erl -sname try -noshell +P 1000000 -pa ebin/ -pa examples/ebin/ -run
> udp_cs_pairs start 10000 -run init stop
>
> etop command:
> /usr/lib/erlang/lib/observer-0.9.10/priv/bin/getop -node 
> -tracing off
>
>
> ========================================================================================
>  ''
> 10:26:35
>  Load:  cpu       177               Memory:  total      784544    binary
> 105148
>         procs  130041                        processes  664962    code
> 1888
>         runq        4                        atom          143    ets
> 144
>
> Pid            Name or Initial Func    Time    Reds  Memory    MsgQ Current
> Function
> ----------------------------------------------------------------------------------------
> <4696.2.0>     erlang:apply/2           '-'   70800 13799852       0
> plists:cluster_runmany
> <4696.35.0>    nsime_simulator          '-' 2315977 10471532       0
> gen_server:loop/6
> <4696.31801.3> erlang:apply/2           '-'   70568   556908       1
> gen:do_call/4
> <4696.31804.3> erlang:apply/2           '-'   77260   344376       1
> io:wait_io_mon_reply
> <4696.31800.3> erlang:apply/2           '-'   72686   344360       0
> io:wait_io_mon_reply
> <4696.31802.3> erlang:apply/2           '-'   75552   344360       0
> io:wait_io_mon_reply
> <4696.31803.3> erlang:apply/2           '-'   74720   344360       0
> io:wait_io_mon_reply
> <4696.37.0>    nsime_node_list          '-'       0   185928       0
> gen_server:loop/6
> <4696.25.0>    code_server              '-'       0   142164       0
> code_server:loop/1
> <4696.38.0>    nsime_channel_list       '-'       0   142144       0
> gen_server:loop/6
> ========================================================================================
>
> I also get a bunch of messages in the etop window if I don't set "-tracing
> off".
>
> Erlang top dropped data 151
> Erlang top got garbage {trace_ts,<4695.18979.3>,out,
>                                  {gen_server,loop,6},
>                                  {1355,998744,703693}}
> Erlang top got garbage {trace_ts,<4695.8399.1>,out,
>                                  {gen,do_call,4},
>                                  {1355,998744,703731}}
>
>
>
> On Sun, Dec 16, 2012 at 7:46 PM, Jesper Louis Andersen
> <> wrote:
>>
>>
>> On Dec 15, 2012, at 9:19 PM, Garrett Smith <> wrote:
>>
>> > I've lost track of this thread -- are you still guessing, or have you
>> > spent any time measuring? I looked back and couldn't see any
>> > discussion about e.g. specific processes with high reduction counts,
>> > growing queue sizes, etc.
>>
>> Sometimes, a well-placed eprof measurement does wonders to tell you what
>> is slow. We had a system that was spending 40% of its time in a specific
>> regular expression once, but this is hard to see unless you profile. I'd say
>> you should look at processes with high reduction counts, or groups of those
>> and then look into why they are reducing all the time.
>>
>> But I have a hunch that this has more to do with the fact that the
>> simulation is serial more than parallel. If you want to utilize more than
>> one core then, you need to figure out how the problem can be split up.
>>
>> Jesper Louis Andersen
>>   Erlang Solutions Ltd., Copenhagen
>>
>>
>>
>



More information about the erlang-questions mailing list