[erlang-questions] how to debug memory leaks?

Attila Rajmund Nohl attila.r.nohl@REDACTED
Sat Dec 4 16:40:24 CET 2010


Ah, I have a GREAT story about this. We've received TRs that our
program was crashing. It turned out that the Linux OOM-killer was
killing some of our processes, because the OS ran out of memory. We
saw that the process heap of some processes were quite big (tens of
MBs), but calling erlang:garbage_collect(Pid) on them decreased it to
the KB range - which was really strange. I've added code to run
erlang:garbage_collect on the biggest processes time to time, but in
the long run it didn't solve the problem, so I've started to check
other memory users, including ets. Then the giveaway was that one row
in the one table occupied 25% of the whole table size - and that was
suspicious.

Some background: the Erlang project I'm working on is older than the
support for records in Erlang (in fact, it's older than Erlang itself
and was rewritten from C++ many moons ago), so we don't use records in
ets table, simple key-value pairs are stored where the values are
almost always proplists. For a particular piece of data the software
developer needed an index by a key in the proplist, so he created an
other ets table for the same data, but in this case with a different
key. He wrote all the necessary code that updated both etc tables when
necessary. For architectural reasons the only the main table is saved
when the software is stopped, so the index table itself was stored in
the main table (using ets:tab2list).

Unfortunately the developer made a mistake (in the very first CVS
commit of his index-related code): he deleted the element from the
index table with a code like this:
Data = ets:lookup(Tab, ...),
ets:delete(Tab, Data)

Because the ets:delete expects the _key_ of the data, not the whole
data, the data wasn't removed from the index table, it grew. Because
this table was stored in the main table, the size of that table also
grew (and the size of one row in particular grew - eventually
providing the decisive clue). There was an Erlang process that read
this huge index table from the main table, then wrote it back every
time the index was updated, temporary allocating a big chunk of
memory. This bug was in our code for 3 years before we've noticed - a
strict type checking compiler probably would have caught it at the
first compilation attempt.

So my advice is to check everything for suspicious behaviour. Too big
files, too big tables, too may processes, etc. Check what that process
does, does it handle big lists, long strings? Use ets:memory() to get
what uses the most memory.

2010/12/4, Alexey Romanov <alexey.v.romanov@REDACTED>:
> I am also seeing growth in heap size and memory consumption on a
> couple of processes, but calling garbage_collect(Pid) manually frees
> most of it. So it may just be that there is enough memory available
> that it isn't garbage-collecting...
>
> On Sat, Dec 4, 2010 at 2:54 PM, mabrek <mabrek@REDACTED> wrote:
>> Great thanks, it works for me.
>>
>> On Sat, Dec 4, 2010 at 2:28 PM, Dan Gudmundsson <dgud@REDACTED> wrote:
>>> Maybe sys:get_status/1 returns the state for gen processes.
>>>
>>> /Dan
>>>
>>> On Sat, Dec 4, 2010 at 12:20 PM, mabrek <mabrek@REDACTED> wrote:
>>>>
>>>> It requires runtime code modification in suspected process to be
>>>> implemented properly. io:format ~p shortens long lists.
>>>>
>>>
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>
>>
>
>
>
> --
> Yours, Alexey Romanov
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>


More information about the erlang-questions mailing list