[erlang-questions] supervisor using 500+ MB of ram to track 6.7 million dead workers?

Daniel Barney dan353hehe@REDACTED
Thu Dec 27 22:11:19 CET 2012


Hey Guys,

This is what I am running into, I have a supervisor which manages a
bunch of processes that are each in charge of a gen_tcp or ssl socket.
After upgrading to R15B02 i started noticing that memory usage was
growing slowly and over the course of three days I found that the
machines were starting to run out of memory. I do not remember which
version we upgraded from, but it was from last year. I can check if it
is really needed.

We upgraded to R15B02, wrote a small patch and then we cherry-picked
it onto the stable version. The patch shouldn't be affecting what I am
seeing as I only changed two files in the SSL application, relating to
parsing certs that do not obey the standard.

So assuming that i had a memory leak somewhere i first checked to see
if any processes were using large amounts of memory, and this is what
I found:

<0.785.0>             supervisor:cowboy_requests_sup/1   8024355 15270185    0
                      gen_server:loop/6                        9

This processes was a supervisor so i checked how many children that it
had, and this is what i got back:

supervisor:count_children(Pid).
[{specs,1},{active,80},{supervisors,0},{workers,6782028}]


And then general process information:

erlang:process_info(Pid).
[{current_function,{gen_server,loop,6}},
 {initial_call,{proc_lib,init_p,5}},
 {status,waiting},
 {message_queue_len,0},
 {messages,[]},
 {links,[<0.783.0>]},
 {dictionary,[{'$ancestors',[<0.783.0>,cowboy_sup,<0.544.0>]},
              {'$initial_call',{supervisor,cowboy_requests_sup,1}}]},
 {trap_exit,true},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.543.0>},
 {total_heap_size,67810415},
 {heap_size,8024355},
 {stack_size,9},
 {reductions,1551847476},
 {garbage_collection,[{min_bin_vheap_size,46368},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,31155}]},
 {suspending,[]}]

erlang:process_info(Pid,memory).
{memory,542484248}

and this is how many processes i have on the machine:
length(processes()).
588

so all of the workers for this supervisor are dead. but it doesn't
think so for some reason?

Is there any reason why a supervisor would hold onto 6.7 million
workers that are already dead?

Any help would be appreciated,
Daniel



More information about the erlang-questions mailing list