[erlang-questions] zlib memory leak

Dmitry Kolesnikov dmkolesnikov@REDACTED
Mon Mar 14 21:25:27 CET 2016


Hello,

I’ve got an interesting issue with zlib at otp-18.2.1 I’ve not checked other releases yet.
I do have a file about 30GB of compressed data, it is expanded to 300GB of textual UTF8 data.
The producer of the file claims that standard gzip is used. The file header [1] is:
1F 8B 08 04 00 00 00 00 00 00 24 03

My decompression program is very simple [2], it reads 64K binary chunks from file and inflates them using zlib:inflate(…).

At some point of time, the inflate do not return and VM binary memory growth to infinity until it is crashed.
The crash is reproducible all the time with my file. The file is not corrupted and gzip is capable to perform it check and inflate data. The file becomes readable by program if it is inflated - deflated again using command line gzip. The header of readable file is: 
1F 8B 08 00 6E EC E6 56 00 03 D4 BD

I am having a challenge to debug this issue further to zlib and understand root-cause. 
Do you have any suggestions on it? 

Best Regards, 
Dmitry

P.S: the file, I am taking about, contains confidential data and cannot be disclosed to community.

Reference:
[1] http://www.zlib.org/rfc-gzip.html
[2] https://github.com/fogfish/feta/blob/master/src/gz.erl


More information about the erlang-questions mailing list