Carrier migration, madvice and poor performance

Thu Aug 27 07:13:21 CEST 2020

Thanks Pablo, very informative.

Bendan Gregg (from Netflix) has a nice tutorial on strace:
http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html

I assume you didn't run strace in PROD to avoid slowing down your system.
Or did you?

/Frank

>
> On 26 Aug 2020, at 17:01, Frank Muller <frank.muller.erl@REDACTED> wrote:
>
> Hi Pablo,
>
> Would be great if you can share here how did you narrow down the issue to
> Madvice call.
>
> Everyone in the mailinglist will benefit from it.
>
> Hi, sure.
> We hit a bad drop on throughput when tried to migrate from OTP21 to
> OTP22.  Initially we checked the changelogs looking for anything suspicious
> (like changes on default behaviours/settings), but that didn’t point to
> anything obvious.
> We noticed that the system CPU usage was much higher.  Looked into that
> more closely   and noted that the # of page faults skyrockets on the new
> version vs the old one:
>
> $ sudo sar -B 10 10
> Linux 4.14.177-107.254.amzn1.x86_64 (ip-172-26-68-64) 08/03/2020 _x86_64_ (16
> CPU)
> 09:21:55 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
> pgscand/s pgsteal/s    %vmeff
> 09:22:05 PM      0.00    243.64 140891.62      0.00 128650.61      0.00
>    0.00      0.00      0.00
> 09:22:15 PM      0.00    103.24 135143.62      0.00 129283.20      0.00
>    0.00      0.00      0.00
> […]
>
> Note the huge # of faults (and those are minor page faults, so not like
> the node was trashing out of memory).
>
> With that, monitored for a couple seconds the memory related syscalls that
> we where doing vs what
> was doing the previous version, to check if there was a different
> pattern.  What we found was the madvice one,
> that didn’t appear before:
>
> $ sudo strace -f -e trace=memory -c -p $pid
>
> % time     seconds  usecs/call     calls    errors sys-call
> ------ ----------- ----------- --------- --------- ----------------
>   86.91    0.622994          74      8371          madvise
>  <- this is the smoking gun
>     7.39    0.053005          71       746           munmap
>     5.70    0.040851          62       656           mmap
> ------ ----------- ----------- --------- --------- ----------------
>
> That  lead us to
> https://github.com/erlang/otp/pull/2046 (listed here
> http://blog.erlang.org/OTP-22-Highlights/)
>
> Take a look at
>
> https://netflixtechblog.com/linux-performance-analysis-in-60-000-milliseconds-accc10403c55
> a classic,  short, concise, and incredibly useful guide on what to check
> when you start digging for performance
> problems,  before trying to look at your code.
>
> cheers
>
>
> Thanks
> /Frank
>
>
>
>>
>> On Tue, 25 Aug 2020, 17:28 Pablo Polvorin, <pablo.polvorin@REDACTED>
>> wrote:
>>
>>>
>>>
>>> On 25 Aug 2020, at 09:05, Lukas Larsson <lukas@REDACTED> wrote:
>>>
>>> Hello,
>>>
>>> On Tue, Aug 25, 2020 at 9:26 AM Pablo Polvorin <
>>> pablo.polvorin@REDACTED> wrote:
>>>
>>>> Hi,
>>>> we recently migrated our systems from OTP21 to OTP22.   On one of them,
>>>> performance noticeable degraded when switched to OTP22 (20% drop).
>>>> After some investigation, we found the apparent cause was the madvice()
>>>> call when returning the carriers to the carrier pool, that lead to lots of
>>>> minor page faults, ultimately killing our performance.  We run in a
>>>> virtualised cloud, not sure how much that affects the minor page fault
>>>> overhead.
>>>>
>>>
>>> Do you know if you have access to MADV_FREE or use MADV_DONTNEED?
>>>
>>> Looks like we don’t :/
>>>
>>> #include <sys/mman.h>
>>> #include <stdio.h>
>>> int main() {
>>>    #ifdef MADV_FREE
>>>   printf("Have it\n");
>>>    #else
>>>   printf("Dont have it\n");
>>>    #endif
>>>    return 0;
>>> }
>>> > Dont have it
>>>
>>> This is on amazon linux,   4.14.186-110.268.amzn1.x86_64 .
>>>
>>
>> That may explain it. Maybe we should not use madvise when FREE is not
>> available.
>>
>> Have you tried do delete this line:
>> https://github.com/erlang/otp/blob/7ad81c674d1aa705ae41743b343043d05ea1944b/erts/emulator/sys/common/erl_mmap.h#L215 and
>> see what happens then?
>>
>>
>>>
>>>
>>>> So we spend some time tweaking the allocator' settings (that haven't
>>>> been touched in many years, with the system itself evolved a lot since that
>>>> time) and got
>>>> to a good improvement. But for carrier migration ultimately the thing
>>>> that worked best for us was just disable it entirely.  Is this a terrible
>>>> idea?
>>>> our load is a fairly stable flow of homogenous requests, that lead to
>>>> several short-lived (milliseconds) processes being spawned. Have plenty of
>>>> memory, so  I'm not too worried about a badly utilised carrier being stuck
>>>> within a scheduler.
>>>>
>>>> Got a few questions regarding this:
>>>>
>>>> * Wonder, it's something common out there to disable carrier
>>>> migration?  Feels a bit strange that nobody hit the same problem when
>>>> updating to OTP22, I’m
>>>> assuming there are lots of not-so-great allocator settings out there,
>>>> like was our case.  (disabling it was our last try actually, we tried the
>>>> settings suggested by
>>>> erts_alloc_config, and then make the +M<S>acul  and +M<S>acfml settings
>>>> significantly stricter as well, and while that helps, still had too many
>>>> page faults).
>>>>
>>>>
>>> Carrier migration can help a lot to deal with memory fragmentation
>>> issues. It is however not free as you have noticed. I know that other
>>> people have disabled it with some success, but as far as I remember that
>>> was to work around bugs in the migration logic, not because of the
>>> performance overhead.
>>>
>>> Given the frequency at which we where abandoning and taking carriers
>>> from the pool,  smells something fishy on our settings and allocation
>>> pattern.  But so far couldn’t really figure out how to bring that down to a
>>> low enough level that the madvice() won’t affect us much.
>>>
>>
>> Yes, that does seem odd. Carriers are not meant to be pushed in and
>> put of the pool at a rapid pace.
>>
>> I don't suppose you have a relatively small example that will reproduce
>> the behaviour? Or if not, then maybe a couple of recon_alloc snapshots?
>>
>>
>>>
>>>
>>>> * What's the tradeoff on having large multiblock carriers,  other than
>>>> the memory overhead when they aren't fully used?.
>>>>
>>>> * Do it make sense to make a config flag to allow carrier migration but
>>>> disable the madvice() on free blocks?
>>>>
>>>
>>> Given your experiences with it, I think that would make sense. We did
>>> not notice any degradation when testing madvise ourselves, but it is not
>>> possible to test all scenarios in all environments.
>>>
>>> Might work on this,  although I guess it would require to re-learn
>>> autoconf sorcery so probably will not happen soon
>>>
>>
>> No need to do any autoconf sorcery, I was thinking that this could
>> probably be a start flag passed to erl?
>>
>>
>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200827/70e9dd53/attachment.htm>