[erlang-questions] Binary memory usage problems. Ref holding, how to debug?

Vans S vans_163@REDACTED
Sun Dec 4 17:37:50 CET 2016


So I have an app that decodes very large JSON using jiffy into maps.  The size of a single json ends up being 8mb or more.

Using 'instrument' module we can see the memory_data().

Afaik 187 TypeNo is binary? Based on the index of
binary in MemoryData = {term(), AllocList}, term()

[{163,140323985428560,102675368,{0,328,0}},
{162,140324609712208,16582664,{0,328,0}},
{187,140322805108816,8857012,{0,19531,1}},
{187,140324408574032,8853330,{0,31903,1}},
{187,140324348485712,8841158,{0,18560,2}},
{187,140324318306384,8833780,{0,25812,2}},
{187,140324439670864,8828792,{0,25113,2}},
{187,140322376695888,8809693,{0,31250,1}},
{187,140323227144272,8785428,{0,20346,0}},
{187,140324357328976,8778769,{0,17234,2}},
{187,140324374511696,8751684,{0,10006,2}},
{187,140323087188048,8747085,{0,24112,1}},
{187,140324678754384,8728978,{0,14962,0}},
{187,140321762197584,8726561,{0,13901,2}},
{187,140322517651536,8724998,{0,23468,1}},
{187,140321205051472,8711957,{0,23748,2}},

Each of these pids is a temporary decode pid, that dies after decode.  As soon as the PID dies, we clean our binary, life is great!


BUT if we use a part of this binary, say we do 

BinaryOver64Bytes = maps:get(description, JsonMap), mnesia:write({just_example, BinaryOver64Bytes , 5}

We have a huge problem now. This 8mb binary will NEVER get cleaned.  Even with a forced GC.  

My hypothesis is jiffy optimizes decoding the json and does not copy the terms over (binary:copy/1) when it creates the output map.

This means when we store a part of this binary permanently say into mnesia, we have the whole binary that never gets cleaned.

The solution is obvious, binary:copy/1 the terms that are permanently stored from the 8mb binary.

My question is, does erlang provide anything to see what is holding a ref to which binary? 
Also another question is should erlang make it so easy to shoot yourself in the foot here.


Would it work to optimize the shared binary garbage collection that it will consider which chunk of a binary is used, and discard the rest that is unused?

This is a common question/problem that arises and I already lost count of how many Erlang users start chats on this topic.

IMO It should not be this easy to screw up, especially with a language like erlang.  



More information about the erlang-questions mailing list