Working with large binaries in the interpreter

Bob Ippolito bob@REDACTED
Fri Sep 1 22:29:35 CEST 2006


On 9/1/06, Bob Ippolito <bob@REDACTED> wrote:
> I'm writing an application that has a large (database) file that it
> keeps as a binary in memory (24.54M). Normally I would use something
> like mmap for this purpose, but I haven't found a similar facility in
> Erlang.
>
> The problem is whenever I get a traceback with that binary on the
> stack or otherwise end up with a printed representation of the binary
> at the interpreter, memory usage grows enormously and it doesn't go
> down. If this happens more than once, I start swapping. Is there
> anything I can do about this? It's really difficult to debug when I
> have to kill beam at any error.
>
> Also, the binary size seems suspiciously high. There should be exactly
> one 25737850 byte (24.54M) binary, but there's 16271355 bytes (15.5M)
> unaccounted for. It should not be a view on a larger binary, because
> that binary is the entire file uncompressed.
>
> (This is Erlang/OTP R11B-0 on Mac OS X 10.4 intel)
>
> Erlang (BEAM) emulator version 5.5 [source] [async-threads:0]
>
> Eshell V5.5  (abort with ^G)
> 1> erlang:memory().
> [{total,2799855},
>  {processes,346118},
>  {processes_used,340190},
>  {system,2453737},
>  {atom,213345},
>  {atom_used,197118},
>  {binary,62460},
>  {code,1594800},
>  {ets,108032}]
> 2> {ok, D} = egeoip:new(), garbage_collect().
> true
> 3> erlang:memory().
> [{total,44853447},
>  {processes,315186},
>  {processes_used,309258},
>  {system,44538261},
>  {atom,216649},
>  {atom_used,201683},
>  {binary,42071665},
>  {code,1664475},
>  {ets,110372}]
> 4> 42071665 - 62460 - size(element(5, D)).
> 16271355
> 5> D. garbage_collect().
> {geoipdb,2,
>          3,
>          3576103,
>          <<1,0,0,107,0,0,2,0,0,60,0,0,3,0,0,30,0,0,4,0,0,18,0,...>>,
>          "/Users/bob/src/egeoip/priv/GeoLiteCity.dat.gz"}
> 6> garbage_collect().
> true
> 7> erlang:memory().
> [{total,489893135},
>  {processes,445361822},
>  {processes_used,445355894},
>  {system,44531313},
>  {atom,216649},
>  {atom_used,201683},
>  {binary,42064717},
>  {code,1664475},
>  {ets,110372}]
>
> Why did processes grow by 445046636 bytes (424.43M) after printing
> that representation?!

The leak can also be reproduced with io:fwrite("~P~n", [D, 24]).

It appears to be a bug in io_lib_pretty, where it unconditionally
decides to use binary_to_list(B) even when it is going to truncate. It
should be using binary_to_list/3 if size(B) > D... so it ends up
creating an enormous list. I'm not sure why this leaks memory, but
that's where it appears to be going.

Doing results(0) and forgetting the binding for D does not make any
difference. It will free the 25M binary from the binaries table, but
it doesn't affect the processes table nor does it get rid of the 16M
in the binaries table that is unaccounted for...

I will try upgrading to R11B-1 (because I want to try the HiPE patches
for Mac OS X x86 anyway) and then put together a patch for the issue.

-bob



More information about the erlang-questions mailing list