Carrier migration, madvice and poor performance

Wed Aug 26 11:50:31 CEST 2020

On Tue, 25 Aug 2020, 17:28 Pablo Polvorin, <pablo.polvorin@REDACTED>
wrote:

>
>
> On 25 Aug 2020, at 09:05, Lukas Larsson <lukas@REDACTED> wrote:
>
> Hello,
>
> On Tue, Aug 25, 2020 at 9:26 AM Pablo Polvorin <
> pablo.polvorin@REDACTED> wrote:
>
>> Hi,
>> we recently migrated our systems from OTP21 to OTP22.   On one of them,
>> performance noticeable degraded when switched to OTP22 (20% drop).
>> After some investigation, we found the apparent cause was the madvice()
>> call when returning the carriers to the carrier pool, that lead to lots of
>> minor page faults, ultimately killing our performance.  We run in a
>> virtualised cloud, not sure how much that affects the minor page fault
>> overhead.
>>
>
> Do you know if you have access to MADV_FREE or use MADV_DONTNEED?
>
> Looks like we don’t :/
>
> #include <sys/mman.h>
> #include <stdio.h>
> int main() {
>    #ifdef MADV_FREE
>   printf("Have it\n");
>    #else
>   printf("Dont have it\n");
>    #endif
>    return 0;
> }
> > Dont have it
>
> This is on amazon linux,   4.14.186-110.268.amzn1.x86_64 .
>

That may explain it. Maybe we should not use madvise when FREE is not
available.

Have you tried do delete this line:
https://github.com/erlang/otp/blob/7ad81c674d1aa705ae41743b343043d05ea1944b/erts/emulator/sys/common/erl_mmap.h#L215
and
see what happens then?

>
>
>> So we spend some time tweaking the allocator' settings (that haven't been
>> touched in many years, with the system itself evolved a lot since that
>> time) and got
>> to a good improvement. But for carrier migration ultimately the thing
>> that worked best for us was just disable it entirely.  Is this a terrible
>> idea?
>> our load is a fairly stable flow of homogenous requests, that lead to
>> several short-lived (milliseconds) processes being spawned. Have plenty of
>> memory, so  I'm not too worried about a badly utilised carrier being stuck
>> within a scheduler.
>>
>> Got a few questions regarding this:
>>
>> * Wonder, it's something common out there to disable carrier migration?
>> Feels a bit strange that nobody hit the same problem when updating to
>> OTP22, I’m
>> assuming there are lots of not-so-great allocator settings out there,
>> like was our case.  (disabling it was our last try actually, we tried the
>> settings suggested by
>> erts_alloc_config, and then make the +M<S>acul  and +M<S>acfml settings
>> significantly stricter as well, and while that helps, still had too many
>> page faults).
>>
>>
> Carrier migration can help a lot to deal with memory fragmentation issues.
> It is however not free as you have noticed. I know that other people have
> disabled it with some success, but as far as I remember that was to work
> around bugs in the migration logic, not because of the performance overhead.
>
> Given the frequency at which we where abandoning and taking carriers from
> the pool,  smells something fishy on our settings and allocation pattern.
> But so far couldn’t really figure out how to bring that down to a low
> enough level that the madvice() won’t affect us much.
>

Yes, that does seem odd. Carriers are not meant to be pushed in and put of
the pool at a rapid pace.

I don't suppose you have a relatively small example that will reproduce the
behaviour? Or if not, then maybe a couple of recon_alloc snapshots?

>
>
>> * What's the tradeoff on having large multiblock carriers,  other than
>> the memory overhead when they aren't fully used?.
>>
>> * Do it make sense to make a config flag to allow carrier migration but
>> disable the madvice() on free blocks?
>>
>
> Given your experiences with it, I think that would make sense. We did not
> notice any degradation when testing madvise ourselves, but it is not
> possible to test all scenarios in all environments.
>
> Might work on this,  although I guess it would require to re-learn
> autoconf sorcery so probably will not happen soon
>

No need to do any autoconf sorcery, I was thinking that this could probably
be a start flag passed to erl?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200826/db447200/attachment.htm>