RabbitMQ/Windows phantom memory issue

Loïc Hoguin lhoguin@REDACTED
Tue Sep 28 11:56:17 CEST 2021


Hello!

I am looking for about 5GB of used binary memory that is missing.

I can see the memory used when looking at memory allocators:

{{binary_alloc,0},
  [{mbcs,
       [{blocks,18542198,18542199,18542199},
        {blocks_size,5372314496,5372316832,5372370512},
        {foreign_blocks,[]},
        {raw_blocks,
            [{binary_alloc,
                 [{count,18542198,18542199,18542199},
                  {size,5372314496,5372316832,5372370512}]}]},
        {carriers,5581,5581,5581},
        {sys_alloc_carriers,5581},
        {carriers_size,5851119616,5851119616,5851119616},
        {sys_alloc_carriers_size,5851119616}]},

The allocators do not appear to be fragmented:

recon_alloc:fragmentation(current).

[{{binary_alloc,0},
  [{sbcs_usage,1.0},
   {mbcs_usage,0.9182027835598129},
   {sbcs_block_size,0},
   {sbcs_carriers_size,0},
   {mbcs_block_size,5395621648},
   {mbcs_carriers_size,5876285440}]},

recon_alloc:memory(usage).

0.9071327019562182

So I would expect this memory to be used by something in the system.

I have looked at processes (memory, old/new heaps): nothing.

recon:bin_leak(40): nothing? The processes found don’t hold much memory.

[{<11296.961.0>,-1064,
  [rabbit_web_dispatch_registry,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
{<11296.894.0>,-557,
  [rabbit_core_metrics_gc,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},

recon:proc_count(binary_memory, 40): nothing.

[{<11296.986.0>,615780,
  [{current_function,{gen_server,loop,7}},{initial_call,{proc_lib,init_p,5}}]},
{<11296.1089.0>,130680,
  [{current_function,{gen_server,loop,7}},{initial_call,{proc_lib,init_p,5}}]},
{<11296.6432.32>,127254,
  [{current_function,{gen_server2,process_next_msg,1}},
   {initial_call,{proc_lib,init_p,5}}]},
{<11296.1179.32>,87609,

Ports all seem to use the same amount of memory.

Ets/Mnesia tables: nothing. There aren’t that many and they’re all below 12MB.

Persistent terms: nothing.

persistent_term:info().

#{count => 56,memory => 27536}

We do not use atomics to store binaries.

Am I missing something? Any tips to figure out what is going on?

This happens with two RabbitMQ customers using Windows
(2012 and 2016) with a recent RabbitMQ version.

This only started happening after moving to a newer RabbitMQ version
and seems unrelated to the OTP version. But it might be a bug triggered
by a new behavior in RabbitMQ. I would be happy to put the blame on
RabbitMQ but I cannot find where this leaking binary memory is used.

I can share crash dumps with OTP team from a test environment that
shows the same symptoms, but leaks less memory because the nodes
are not used as much (so the numbers differ but there’s still memory
not accounted for).

The problem happens both on mostly idle and busy nodes, with busy
nodes losing track of memory quicker.

I will try to reproduce this on a local environment as time allows.

Cheers,

--
Loïc Hoguin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210928/2fb547f9/attachment.htm>


More information about the erlang-questions mailing list