[erlang-questions] Help with allocator tuning

Wed Nov 23 15:22:05 CET 2016

Hello,

On Wed, Nov 23, 2016 at 1:47 PM, Max Lapshin <max.lapshin@REDACTED> wrote:

> We are running our erlang software flussonic, it captures around 1,5
> gbit/s of input via TCP, allocates lot of binaries from 500 bytes to 1500
> bytes and then prepares large binaries around 1 megabyte (video blobs).
> There are produced around 250 such blobs per second.
>

Have you verified that these assumptions are correct via
http://ferd.github.io/recon/recon_alloc.html#average_block_sizes-1? Make
sure to take multiple snapshots of current, as max is not really all that
useful for this measurement.

+stbt db +sbwt short +swt very_low +sfwi 20 +zebwt short +sub true +MBas
> aoffcaobf +MBacul 0
>

The carrier oriented allocator strategies (the ones with the longest names,
i.e. CARRIERSTRATcBLOCKSTRAT) were specifically introduced to enable
carrier migration. So using one of those together with disabling acul makes
little sense. You most likely want to run +MBas aobf if you disable carrier
migration.

> I get around 2000 mmaps and munmaps per second and recon_alloc tells that
> I have 98% of usage.  It looks rather strange, so I tried to play with
> tunings and switched to:
>
There is a mseg cache that can be used to cache mmap:ed segments. By
default it is set to something like 10 segments, which seems to be too low
for your usecase. You can increase the number of segments cached through
the +MMmcs switch. The max value is 30, but I know that some other users
have tried to use much higher numbers by changing the code in erts and that
has been better for them.

You may want to take a look at the cache hit rates that you get from
http://ferd.github.io/recon/recon_alloc.html#cache_hit_rates-0, to see if
your changes have any effect.

+stbt db +sbwt short +swt very_low +sfwi 20 +zebwt short +sub true +MBas
> aoffcaobf +MBsbct 4096 +MBacul de +Mulmbcs 131071 +Mumbcgs 1 +Musmbcs 4095
>
 If it is specifically binaries that you are looking at, I would just
change the config for +MB and not +Mu. Also having a smaller smbcs
than sbct seems a bit odd, why not just up the smbcs to the same value as
lmbcs?

> I'm not quite sure that my settings are sane, but I tried to make very
> large multiblock areas and try to store my binaries inside large areas (not
> single block carrier, but multiblock carrier).
>
> With these settings I get about 50 mmap/munmap per second. Seems that
> hugepages are not used (frankly speaking I thought to autoenable them).
>
If you align the mbcs with the size of transparent huge pages that could be
beneficial. On my system they are set to 2 MB, is the 128 MB that you are
trying to hit what they are set to on your system?

> But with these settings I get about 50% of usage and all servers are
> quickly getting killer by OOM killer.
>
This is quite odd, it almost feels like the carrier pool is misbehaving.
Have you checked if a large amount of the carriers are in the carrier pool
when this happens? Maybe try to lower the usage needed to put them in the
pool, i.e. something like "+MBacul 10".

I assume that you are running a reasonably late version of Erlang/OTP? I
remember that we did some bug fixes a while back in regards to the pool.

With these flags I tried to hint allocator to create 128MB large areas and
> objects smaller than 4 megabytes to put into there areas.
>

So my questions are:
>
> 1) should I worry about 2000 of mmap/unmap syscalls per second?
>
Depends on how many schedulers you have running. I don't have any figures
about how many mmaps/scheduler per second is good, but I would say that
the  fewer syscalls you do you have the better it is.

> 2) should I try to reduce usage of sbct and increase usage of mbct?
>
It's a bit of a tradeoff. Having too large items in the mbc allocations
makes it harder for them to find spots to place blocks, while on the other
hand the mbc allocators are better at scalability then the sbc allocators.

So by placing too large blocks in the mbc, you get fragmentation issues.
But if you place too many blocks in the sbcs, you get scalability issues
instead :)

In general you want to have the majority of your allocations go to mbcs,
what the ratio should be is hard to tell.

> 3) are my flags to erlang VM compatible with each other?
>
Seem to be.

> 4) maybe some other hints?
>
Measure and try to really understand what
erlang:system_info({allocator,binary_alloc}) is giving you. recon_alloc is
a great tool, but it is built with an interface to find the specific
problems that we have encountered and it hides information from you. Most
of the time I end up writing small scripts that analyze the data in a new
way looking for exactly what I want to see over time.

Also reading the erts_alloc documentation is well worth doing very
carefully.

There is also the possibility to completely disable erts alloc and fallback
to malloc, you do that via "+Mea min". Doing that you loose a bunch of nice
statistics and scalability features. However more man hours have been spent
optimizing them so they are a little bit faster per allocated item
allocation.

Lukas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161123/9b8da958/attachment.htm>