[erlang-questions] msacc: much time spent in alloc and gc

Lukas Larsson garazdawi@REDACTED
Thu Aug 18 10:53:20 CEST 2016


On Tue, Aug 16, 2016 at 3:38 PM, Danil Zagoskin <z@REDACTED> wrote:

> Hello!
>
> We have a system which uses only 20—25% of available CPU. After that we
> see:
>   * increased run_queue
>   * increased scheduler active time (from erlang:statistics(
> scheduler_wall_time))
>   * decreased CPU usage
>
> At first there were some problems visible with lcnt, but now (after fixes)
> there aren't.
>
> While the system is in troubled state, msacc shows in cumulative scheduler
> stats:
>   * 19.3% alloc
>   * 19.3% gc
>   * 25.3% ets
>   * 12.3% emulator
>   * 10.0% bif
>   * (other times are quite low)
>
> The system is 2x 10-core Xeon E5-2660 (with hyperthreading), so we have 40
> schedulers there.
> We use OTP 19.0 with +MBas aoffcaobf +MBacul 0
>
> Switching scheduler binding from unbound (no option) to default
> (thread_no_node_processor_spread according to documentation) sligtly
> increases the throughput, but the alloc/gc/ets times are still very high.
>
> How do we inspect high alloc/gc scheduler times?
>

For garbage collection I would maybe look at using tracing to see if you
can figure out which process it is that is doing all the garbage
collections. Also from msacc you should be able to figure out if most of
the time is spent doing major or minor gc.

The alloc is harder to figure out. I would recommend using perf to dig
deeper and see if any C functions stick out and try to understand what they
do.


> Is there a way to select the most suitable allocation strategies without
> bruteforcing every option on every allocator type?
>

There is a tool called erts_alloc_config that may be useful to you. It was
designed to solve this type of problem, although I don't think it has been
updated in a long time so it may be outdated with what has happened with
the allocated the last releases.


> Maybe we are missing something? Any advice?
>
>
msacc is a new tool that we've not used that much yet. Every time I've used
it I've had to add a bunch of new states and kind of bisect my way down
into the code to figure out what it is that is taking all the time. As I've
mentioned in the other mail, linxu perf is also a good help here to figure
out what is taking time.

Lukas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160818/d8b68b94/attachment.htm>


More information about the erlang-questions mailing list