[erlang-questions] binary, ets and memory...

Dmitry Kolesnikov dmkolesnikov@REDACTED
Fri Oct 12 21:42:25 CEST 2012


this make sense for me, especially taking into account that binary:split returns references to subject but I always thought that ets copies data. But here it seems that data is copied but reference counter to parts is not decreased... 

- Dmitry


On Oct 12, 2012, at 10:27 PM, Chris Hicks wrote:

> I could be wrong but I'm going to take a guess and say that in the first implementation the whole binary is being kept around and never destroyed. I think what's happening is you are getting a reference to part of a larger binary and passing that around, but the larger binary is sticking around since part of it is still being used. Copying the part you need, and thus creating an entirely new binary, is probably allowing all references to that large binary to disappear so that it can be GC'd.
> 
> That's a rather naive guess based on what I know about how binaries work. Can anyone else back that up or tell me I'm wrong?
> 
> Chris
> 
> > From: dmkolesnikov@REDACTED
> > Date: Fri, 12 Oct 2012 22:15:21 +0300
> > To: erlang-questions@REDACTED
> > Subject: [erlang-questions] binary, ets and memory...
> > 
> > Hello,
> > 
> > Recently, my system starts to swap. The investigation has indicated that memory consumption was almost twice more then I've expected and to be honest, I've confused why it so...
> > 
> > I am talking about Erlang R15B (erts-5.9) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false]
> > 
> > So, I do have two processes 
> > * first process handles tcp/ip socket I/O. received data is pushed to second process
> > * second process splits binaries binary:split(Buf, [<<$,>>], [global]), parses data and makes list of tuple. When list of tuples is ready, it folds tuples into ets table
> > ets:new(cache, [named_table, ordered_set, public]),
> > lists:foldl(
> > fun({A, B}, Acc) -> ets:insert(cache, {A, B}) end,
> > true,
> > List
> > ).
> > 
> > I do have about 6M tuples, where first element is SHA1 signature, second element is integer. Receiver process pushes 100 tuples per time. I hope you got rough idea.
> > 
> > When cache is populated, I do have following memory usage, it looks suspicious for me:
> > {total,1179235080},
> > {processes,2373638},
> > {processes_used,2373570},
> > {system,1176861442},
> > {atom,264505},
> > {atom_used,253241},
> > {binary,434761000}, <-- this looks strange for me. Why binaries are left in heap and in ets?
> > {code,6521469},
> > {ets,732409416}
> > 
> > If I change my implementation to 
> > lists:foldl(
> > fun({A, B}, Acc) -> ets:insert(cache, {binary:copy(A), B}) end,
> > true,
> > List
> > ).
> > then memory utilization is on the par with my estimates
> > {total,701448856},
> > {processes,2405251},
> > {processes_used,2405170},
> > {system,699043605},
> > {atom,264505},
> > {atom_used,253241},
> > {binary,2686280},
> > {code,6521493},
> > {ets,686663080}
> > 
> > Best Regards, 
> > Dmitry
> > 
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121012/5dfc140c/attachment.htm>


More information about the erlang-questions mailing list