[erlang-questions] zlib design flaw?

Wed Sep 24 10:01:55 CEST 2014

Chris, you can call me Sungjin as it's my first name.

The problem is that the server crashes with out-of-memory as soon as it
reads all the data.  I have no chance to decide whether to drop or not
before the zlib library tosses those big binaries.  So what I want is to
let the zlib library toss moderate sized fragments of the inflated data so
that I might stop processing further so as not to crash whole vm.

And I also think this is kind of serious problem that I can kill any web
server written in erlang in this simple way.  Choosing appropriate
compression scheme according to content type is, sorry, not the answer to
the topic I originally issued.  I was issuing a security problem against
malicious hackers who send one megabytes of deflated zeros pretending to be
images.  Every those fake images will consume gigabytes of system memory
with inflated zero binary as Guilherme noted.

On Wed, Sep 24, 2014 at 4:22 PM, Christopher Vance <cjsvance@REDACTED>
wrote:

> I'm not sure whether to call you Park, or Sungjin, or something else, but
> Hello in any case.
>
> The simplest way I can think of to do (1) is to read all the data, doing
> the decompression as you go, but throwing away the actual decompressed data
> and everything else not needed in calculating the decompressed size. This
> is only useful if you can then decide to go back to the beginning of the
> data and read it all over again if you decide to keep the result of the
> decompression after all. (This is fine for reading a file which will still
> be there after you work out the decompressed size, but may require writing
> all data to a temporary file if you're reading data from a network stream.)
>
> The simplest way I can think of to do (2) is to read all the data, keeping
> what you decompress until you discover the next part won't fit. You then
> have to stop, return the data you have so far (probably less than the full
> size permitted) and you have to be able to resume at the beginning of the
> part that didn't fit, keeping any compression/decompression state necessary
> to continue.
>
> The simplest way I can think of to do (3) is to start doing (2) and abort
> the procedure altogether if the next part won't fit.
>
> My mention of 1:1 compression was intended to be facetious - a joke. My
> apologies if this wasn't clear. The easiest method I can think of is to
> return the input stream unaltered, so the "compressed" data is the same
> size as the original. There are other methods which are also trivially
> reversed with no risk of buffer overflow. If you haven't saved any space
> it's not actually useful, and it's not really compression, but you may want
> to be able to specify such an algorithm when chaining functions together.
>
> The proportion of compression you'll get from an algorithm will depend on
> the nature and variability of the content, and on the way the algorithm
> adapts to that variability. If you know the data is all zero values, a good
> compression algorithm is merely to report the length of the data; on the
> other hand, a stream of random (or pseudo-random) bits is theoretically
> uncompressable.
>
> So if mixing compression and encryption, you can usually save space by
> compressing before encrypting, and decrypting before decompressing, but the
> other way around will almost certainly increase your size because the
> compression algorithm will give a larger output than the input. If you know
> the data is sufficiently random, you may choose not to attempt compression
> at all.
>
> On Wed, Sep 24, 2014 at 3:03 PM, Park, Sungjin <jinni.park@REDACTED>
> wrote:
>
>> Christopher,
>>
>> I don't have protocol level knowledge about zlib.  But in my opinion - in
>> programmer's perspective, one of the following should be possible.
>>
>> 1) an API to get inflated data size before actually inflating the data
>> 2) an API to get part of the inflated data by given size and let the
>> aggregation be done by the user
>> 3) an API option to set limit in the size of the inflated data
>>
>> Currently, we are doing 3) with a custom NIF.  Better of course if I can
>> do any of the above with standard library.
>>
>> I don't understand about your 1:1 compression scheme suggestion.  AFAIK,
>> compression is used to reduce number of bytes on flight after all, isn't it?
>>
>>
>> On Wed, Sep 24, 2014 at 12:41 PM, Christopher Vance <cjsvance@REDACTED>
>> wrote:
>>
>>> The zlib protocol is fully specified, and it appears the code is working
>>> correctly, so the only issue you might have is deciding what to do when you
>>> see input which could be seen as malicious (but which is still properly
>>> defined). Do you want to abort handling the input stream, or do you have an
>>> alternate suggestion?
>>>
>>> If you don't like the way zlib does it, or prefer a compression scheme
>>> which is more predictable, why not try a different compression algorithm,
>>> and see if it does any better in these pathological cases. I can specify
>>> you several 1:1 "compression" schemes where you only ever get out the same
>>> number of bytes you put in, but you'd probably think these safe method are
>>> too boring.
>>>
>>> On Wed, Sep 24, 2014 at 1:15 PM, Guilherme Andrade <g@REDACTED>
>>> wrote:
>>>
>>>>  Hi Sungjin,
>>>>
>>>> I've recently dealt this with very same issue, albeit only as a
>>>> security hardening and a prevention measure; and, like you, I've looked
>>>> into the zlib C code that's bundled with erts and arrived at that same
>>>> conclusion.
>>>>
>>>> I was only able to limit it on a theoretical basis: the zlib guys
>>>> themselves state[1] that a maximum compression ratio of 1032:1 is
>>>> achievable (with big blobs of zeroes.) Therefore, if I want to limit the
>>>> uncompressed content to less than, let's say, 5 MiB, I'll only accept
>>>> compressed content of up to ~5 KiB. This thinking might be missing
>>>> something, though.
>>>>
>>>> If there's a better/cleaner way to deal with this, I would love to know.
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> [1]: http://www.zlib.net/zlib_tech.html
>>>>
>>>>
>>>>
>>>> On 24-09-2014 03:55, Park, Sungjin wrote:
>>>>
>>>> Hi, I'm about to report a problem with erlang's zlib library interface
>>>> which I think is a design flaw at this point of time.
>>>>
>>>>  We recently had some malicious packets which were not very big in the
>>>> first place but inflated to really big ones - hundreds of megabytes each.
>>>> As a result, the server crashed with out-of-memory by the processes calling
>>>> zlib:inflate/2.  Urgency forced us to make a custom NIF library with
>>>> inflation size limit.  We also studied erlang reference manual but couldn't
>>>> find anything useful.  The zlib library source code shows even
>>>> zlib:setBufSize/2 does not prevent producing very big binaries.
>>>>
>>>>  Not being able to know how big the data would become after inflation,
>>>> it should be a quite common problem.  So I'm curious if I missed something
>>>> very simple and nice.  Is there anything like that?
>>>>
>>>>  --
>>>> Park, Sungjin
>>>>
>>>> -------------------------------------------------------------------------------------------------------------------
>>>> Peculiar travel suggestions are dancing lessons from god.
>>>>   -- The Books of Bokonon
>>>>
>>>> -------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-questions mailing listerlang-questions@REDACTED://erlang.org/mailman/listinfo/erlang-questions
>>>>
>>>>
>>>> --
>>>> Guilherme
>>>> https://www.gandrade.net/
>>>> PGP: 0x35CB8191 / 1968 5252 3901 B40F ED8A  D67A 9330 79B1 35CB 8191
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>
>>>>
>>>
>>>
>>> --
>>> Christopher Vance
>>>
>>
>>
>>
>> --
>> Park, Sungjin
>>
>> -------------------------------------------------------------------------------------------------------------------
>> Peculiar travel suggestions are dancing lessons from god.
>>   -- The Books of Bokonon
>>
>> -------------------------------------------------------------------------------------------------------------------
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>
>
> --
> Christopher Vance
>

-- 
Park, Sungjin
-------------------------------------------------------------------------------------------------------------------
Peculiar travel suggestions are dancing lessons from god.
  -- The Books of Bokonon
-------------------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140924/6d983b5b/attachment.htm>