Carrier migration, madvice and poor performance
Tue Aug 25 14:05:16 CEST 2020
On Tue, Aug 25, 2020 at 9:26 AM Pablo Polvorin <pablo.polvorin@REDACTED>
> we recently migrated our systems from OTP21 to OTP22. On one of them,
> performance noticeable degraded when switched to OTP22 (20% drop).
> After some investigation, we found the apparent cause was the madvice()
> call when returning the carriers to the carrier pool, that lead to lots of
> minor page faults, ultimately killing our performance. We run in a
> virtualised cloud, not sure how much that affects the minor page fault
Do you know if you have access to MADV_FREE or use MADV_DONTNEED?
> So we spend some time tweaking the allocator' settings (that haven't been
> touched in many years, with the system itself evolved a lot since that
> time) and got
> to a good improvement. But for carrier migration ultimately the thing that
> worked best for us was just disable it entirely. Is this a terrible idea?
> our load is a fairly stable flow of homogenous requests, that lead to
> several short-lived (milliseconds) processes being spawned. Have plenty of
> memory, so I'm not too worried about a badly utilised carrier being stuck
> within a scheduler.
> Got a few questions regarding this:
> * Wonder, it's something common out there to disable carrier migration?
> Feels a bit strange that nobody hit the same problem when updating to
> OTP22, I’m
> assuming there are lots of not-so-great allocator settings out there, like
> was our case. (disabling it was our last try actually, we tried the
> settings suggested by
> erts_alloc_config, and then make the +M<S>acul and +M<S>acfml settings
> significantly stricter as well, and while that helps, still had too many
> page faults).
Carrier migration can help a lot to deal with memory fragmentation issues.
It is however not free as you have noticed. I know that other people have
disabled it with some success, but as far as I remember that was to work
around bugs in the migration logic, not because of the performance overhead.
> * What's the tradeoff on having large multiblock carriers, other than the
> memory overhead when they aren't fully used?.
> * Do it make sense to make a config flag to allow carrier migration but
> disable the madvice() on free blocks?
Given your experiences with it, I think that would make sense. We did not
notice any degradation when testing madvise ourselves, but it is not
possible to test all scenarios in all environments.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions