[erlang-questions] How can we increase multiblock carrier utilization for binary_alloc?

Mon May 7 11:21:43 CEST 2018

On Thu, May 3, 2018 at 12:18 PM, Gerhard Lazu <gerhard@REDACTED> wrote:

> Hi Lukas,
>
> Why do you not use erlang:memory() as the base for whether you can accept
>> more messages? Having a low memory utilisation is not bad in itself, unless
>> of course some other program on the same machine needs the memory.
>>
>
> We used to use erlang:memory(), but we've learned that it doesn't work
> well in practice [1]. Linux OOM will take action based on RSS, not Erlang
> allocated memory.
>

Yes, looking at erlang:memory() could make you end up in those scenarios.
However it should be possible to look at recon_alloc:memory(unused), to get
a ballpark figure about how much memory that the Erlang VM has reserved
that it is not using at the moment.

>
>
>> Looking at the used memory under load and after, at peak the allocated
>> memory is 1510 MB and then after it is 577 MB. So about 2/3rds of the
>> allocated memory was returned to the OS. While this is not perfect, it is
>> not terrible either. Reducing it further may not be easy.
>>
>
> Your observation is true and accurate. It's also true and accurate that
> out of 577MB allocated, 300MB is used & 277MB is unused, meaning that
> almost half of the allocated memory is not used.
>
> I understand that it may not be easy to reduce the unused memory, but all
> I'm thinking is that whilst this unused memory might seem small in this
> particular scenario, what happens when the Erlang VM has 60GB allocated?
>

One thing that you could try is to see if malloc does a better job than
erts does. In general the erts allocators are better at scalability, while
malloc is better at performance. I don't know which is better at dealing
with fragmentation, use "+MBe false" to disable it for binary alloc. The
really bad part about this is that you loose all statistics. So if you do
get into other issues it will be much harder to figure out what is going on.

You could also try to play with the sbct to see if you get better
allocations using a lower value. This will cause more allocations to be
places into sbcs which could be good for fragmentation. You seem to have an
average block size of about 2 kb, so I would try setting the sbct to about
double that to start with and see if you notice any difference. It's hard
to know what be a good value with the OTP-20 instrumentation, so you will
have to try and see if you get any difference.

> Would it help if we can show the impact of this behaviour on hosts with
> larger memory usage?
>

No I don't think so. The behavior should be similar just with scaled up
values. I suppose that it would be a good idea to verify that if you can.

>
>
>> Can you reproduce the behaviour? Would it be possible to get a
>> recon_alloc snaphot during and after load?
>>
>
> I'm sharing recon_alloc snapshots during & after load for the following:
>
> 1. erts_alloc defaults (lmbcs 5120) [2]
> 2. -MBlmbcs 512 [3]
>
> I've also captured a during & after load screenshots of the 2
> configurations running side-by-side (left is erts_alloc defaults (lmbcs
> 5120), right is -MBlmbcs 512) [4].
>
> While our initial configuration used a few more flags, -MHas ageffcbf
> -MBas ageffcbf -MHlmbcs 512 -MBlmbcs 512, I've kept things as simple as
> possible on this run, and only used -MBlmbcs 512
>
> Have you seen https://github.com/erlang/otp/pull/1790 that was just
>> merged to master with the accompanying blog post:
>> http://blog.erlang.org/Memory-instrumentation-in-OTP-21/?
>>
>
> I haven't, thank you for sharing. We are waiting on Elixir #6611 before we
> can test against OTP 21.0-rc1 [5].
>
>
>> I assume that you have tried ageffcaobf?
>>
>
> Yes, we have tried all allocation strategies. ageffcbf resulted in
> "spikier" CPU and dirty mem writeback, but also sharper drops in dirty mem
> writeback. Under load, ageffcbf had 1% lower RSS usage, and 2.5% lower
> unused memory than ageffcaobf. After load however, ageffcbf had 5% lower
> RSS usage & 4% lower unused memory. In conclusion, ageffcbf proved the
> best out of all allocation strategies.
>
>
Here is a side-by-side comparison of -MBas ageffcaobf -MBlmbcs 512 (left)
> vs -MBas ageffcbf -MBlmbcs 512 (right) [6], and the relevant recon_alloc
> snapshots [7].
>

While aobf should be a little bit better, I suspect that since we
introduced the carrier pool and thus use the the carrier strategies the
difference is within the error margins, especially when you run a system
with lots of small carriers.

On a related point, this discussion has caused me to start looking at using
madvise/VirtualAlloc to let the OS know that pages within carriers are not
used by erts any more. I'm not sure how that will interact with RSS as from
I I've been able to figure out the pages remain associated with the program
until the OS needs them so satisfy some memory request. Any such feature
won't be in OTP-21, but may be added later.

> Thank you Lukas for helping out with this, Gerhard.
>
> [1] https://github.com/rabbitmq/rabbitmq-server/issues/1223 &
> https://github.com/rabbitmq/rabbitmq-server/pull/1259#issu
> ecomment-308428057 - the entire PR context is valuable and relevant
> [2] https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory
> -allocators/during-MBlmbcs_5120.recon_alloc.snapshot &
> https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory-
> allocators/after-MBlmbcs_5120.recon_alloc.snapshot
> [3] https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory
> -allocators/during-MBlmbcs_512.recon_alloc.snapshot &
> https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory-
> allocators/after-MBlmbcs_512.recon_alloc.snapshot
> [4] https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory
> -allocators/during-MBlmbcs_5120-vs-MBlmbcs_512.png &
> https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory-
> allocators/after-MBlmbcs_5120-vs-MBlmbcs_512.png
> [5] *https://github.com/elixir-lang/elixir/issues/6611#issuecomment-386208496
> <https://github.com/elixir-lang/elixir/issues/6611#issuecomment-386208496>*
> [6] https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory
> -allocators/during-MBas_ageffcaobf-MBlmbcs_512.recon_alloc.snapshot &
> https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory-
> allocators/after-MBas_ageffcaobf-MBlmbcs_512.recon_alloc.snapshot +
> https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory-
> allocators/during-MBas_ageffcbf-MBlmbcs-512.recon_alloc.snapshot &
> https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory-
> allocators/after-MBas_ageffcbf-MBlmbcs-512.recon_alloc.snapshot
> [7] https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory
> -allocators/during-MBas_ageffcaobf-MBlmbcs_512-vs-MBas_
> ageffcbf-MBlmbcs_512.png & https://s3-eu-west-1.amazona
> ws.com/rabbitmq-share/memory-allocators/after-MBas_ageffcao
> bf-MBlmbcs_512-vs-MBas_ageffcbf-MBlmbcs_512.png +
> https://s3-eu-west-1.amazonaws.com/rabbitmq-share/memory-
> allocators/during-cpu-MBas_ageffcaobf-MBlmbcs_512-vs-
> MBas_ageffcbf-MBlmbcs_512.png & https://s3-eu-west-1.amazonaws
> .com/rabbitmq-share/memory-allocators/during-memory-MBas_
> ageffcaobf-MBlmbcs_512-vs-MBas_ageffcbf-MBlmbcs_512.png
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180507/4c0dcf87/attachment.htm>