supervisor using 500+ MB of ram to track 6.7 million dead workers?

Daniel Barney dan353hehe@REDACTED
Thu Dec 27 22:11:19 CET 2012

Hey Guys,

This is what I am running into, I have a supervisor which manages a
bunch of processes that are each in charge of a gen_tcp or ssl socket.
After upgrading to R15B02 i started noticing that memory usage was
growing slowly and over the course of three days I found that the
machines were starting to run out of memory. I do not remember which
version we upgraded from, but it was from last year. I can check if it
is really needed.

We upgraded to R15B02, wrote a small patch and then we cherry-picked
it onto the stable version. The patch shouldn't be affecting what I am
seeing as I only changed two files in the SSL application, relating to
parsing certs that do not obey the standard.

So assuming that i had a memory leak somewhere i first checked to see
if any processes were using large amounts of memory, and this is what
I found:

<0.785.0>             supervisor:cowboy_requests_sup/1   8024355 15270185    0
                      gen_server:loop/6                        9

This processes was a supervisor so i checked how many children that it
had, and this is what i got back:


And then general process information:



and this is how many processes i have on the machine:

so all of the workers for this supervisor are dead. but it doesn't
think so for some reason?

Is there any reason why a supervisor would hold onto 6.7 million
workers that are already dead?

Any help would be appreciated,

