[erlang-questions] Garbage collecting binaries

Robert Virding robert.virding@REDACTED
Fri Oct 8 03:01:49 CEST 2010


  Large binaries are not stored in any process heap, but are specially 
handled outside the normal process data. This allows them to be sent in 
messages between processes by reference instead copying, thereby saving 
the copying of potentially large amounts of data. Which is the main 
reason for doing it this way.

However, this does cause problems for the gc. Once a large binary has 
been passed between processes it is no longer enough to collect just one 
process before freeing it. You must first collect a process before you 
can determine which references to large binaries it has had are no 
longer in use. This means you have to first collect at least each 
process through which the binary has passed before you can determine if 
there are any references to it, or if it can be freed. Which means that 
freeing of large binaries is delayed, which means it is possible to fill 
memory with unreferenced binaries which have not yet been collected.

The fact that you can get references to large binaries even though you 
only reference parts of them just aggravate matters.

Unfortunately there are few ways around this. As usual it is a matter of 
making trade-offs and hoping for the best. Removing the special handling 
of large binaries would kill many applications.

Robert

On 2010-10-07 16.07, tom kelly wrote:
> Hello List,
>
> I need some help on garbage collecting binaries.
> I have an application that handles large binaries and under load it eats up
> all the available memory then falls over, even when I start all the
> data-handling processes with "{spawn_opt,[{fullsweep_after, 20}]}".
>
> Reading point 5.15 on the page:
> http://www.erlang.org/faq/how_do_i.html
> leads me to think that calling the garbage collector on the process that
> created a binary will clean it up but my shell experiment below shows me I'm
> wrong. It's now 10 minutes since I've done this experiment and the large
> binary is still in memory.
>
> Maybe I have to call the garbage collector of the process whose heap is
> storing the binary? If so, which process is it?
>
> Any pointers as to what I'm doing wrong or mis-understand will be greatly
> appreciated.
>
> I'm on a legacy R12B5 system.
>
> //Tom.
>
>
>
> 18>  process_info(self(),total_heap_size).
> {total_heap_size,4181}
>
> 19>  A = L(100000).
> [140,161,41,128,44,96,215,43,15,164,88,107,1,167,4,125,118,
>   180,121,181,160,124,244,140,169,215,31,82,43|...]
>
> 20>  process_info(self(),total_heap_size).
> {total_heap_size,1149851}
>
> 21>  f(A).
> ok
>
> 22>  erlang:garbage_collect().
> true
>
> 23>  process_info(self(),total_heap_size).
> {total_heap_size,3194}
>
>
> 24>  memory(binary).
> 6536
>
> 25>  A = B(100000).
> <<232,241,55,171,218,35,86,122,211,185,232,1,203,249,181,
>    218,176,33,88,131,102,56,102,82,158,114,200,174,253,...>>
>
> 26>  memory(binary).
> 106704
>
> 27>  f(A).
> ok
>
> 28>  memory(binary).
> 106704
>
> 29>  erlang:garbage_collect().
> true
>
> 30>  memory(binary).
> 106560
>


More information about the erlang-questions mailing list