[erlang-questions] zlib design flaw?

Wed Sep 24 09:22:47 CEST 2014

I'm not sure whether to call you Park, or Sungjin, or something else, but
Hello in any case.

The simplest way I can think of to do (1) is to read all the data, doing
the decompression as you go, but throwing away the actual decompressed data
and everything else not needed in calculating the decompressed size. This
is only useful if you can then decide to go back to the beginning of the
data and read it all over again if you decide to keep the result of the
decompression after all. (This is fine for reading a file which will still
be there after you work out the decompressed size, but may require writing
all data to a temporary file if you're reading data from a network stream.)

The simplest way I can think of to do (2) is to read all the data, keeping
what you decompress until you discover the next part won't fit. You then
have to stop, return the data you have so far (probably less than the full
size permitted) and you have to be able to resume at the beginning of the
part that didn't fit, keeping any compression/decompression state necessary
to continue.

The simplest way I can think of to do (3) is to start doing (2) and abort
the procedure altogether if the next part won't fit.

My mention of 1:1 compression was intended to be facetious - a joke. My
apologies if this wasn't clear. The easiest method I can think of is to
return the input stream unaltered, so the "compressed" data is the same
size as the original. There are other methods which are also trivially
reversed with no risk of buffer overflow. If you haven't saved any space
it's not actually useful, and it's not really compression, but you may want
to be able to specify such an algorithm when chaining functions together.

The proportion of compression you'll get from an algorithm will depend on
the nature and variability of the content, and on the way the algorithm
adapts to that variability. If you know the data is all zero values, a good
compression algorithm is merely to report the length of the data; on the
other hand, a stream of random (or pseudo-random) bits is theoretically
uncompressable.

So if mixing compression and encryption, you can usually save space by
compressing before encrypting, and decrypting before decompressing, but the
other way around will almost certainly increase your size because the
compression algorithm will give a larger output than the input. If you know
the data is sufficiently random, you may choose not to attempt compression
at all.

On Wed, Sep 24, 2014 at 3:03 PM, Park, Sungjin <jinni.park@REDACTED> wrote:

> Christopher,
>
> I don't have protocol level knowledge about zlib.  But in my opinion - in
> programmer's perspective, one of the following should be possible.
>
> 1) an API to get inflated data size before actually inflating the data
> 2) an API to get part of the inflated data by given size and let the
> aggregation be done by the user
> 3) an API option to set limit in the size of the inflated data
>
> Currently, we are doing 3) with a custom NIF.  Better of course if I can
> do any of the above with standard library.
>
> I don't understand about your 1:1 compression scheme suggestion.  AFAIK,
> compression is used to reduce number of bytes on flight after all, isn't it?
>
>
> On Wed, Sep 24, 2014 at 12:41 PM, Christopher Vance <cjsvance@REDACTED>
> wrote:
>
>> The zlib protocol is fully specified, and it appears the code is working
>> correctly, so the only issue you might have is deciding what to do when you
>> see input which could be seen as malicious (but which is still properly
>> defined). Do you want to abort handling the input stream, or do you have an
>> alternate suggestion?
>>
>> If you don't like the way zlib does it, or prefer a compression scheme
>> which is more predictable, why not try a different compression algorithm,
>> and see if it does any better in these pathological cases. I can specify
>> you several 1:1 "compression" schemes where you only ever get out the same
>> number of bytes you put in, but you'd probably think these safe method are
>> too boring.
>>
>> On Wed, Sep 24, 2014 at 1:15 PM, Guilherme Andrade <g@REDACTED>
>> wrote:
>>
>>>  Hi Sungjin,
>>>
>>> I've recently dealt this with very same issue, albeit only as a security
>>> hardening and a prevention measure; and, like you, I've looked into the
>>> zlib C code that's bundled with erts and arrived at that same conclusion.
>>>
>>> I was only able to limit it on a theoretical basis: the zlib guys
>>> themselves state[1] that a maximum compression ratio of 1032:1 is
>>> achievable (with big blobs of zeroes.) Therefore, if I want to limit the
>>> uncompressed content to less than, let's say, 5 MiB, I'll only accept
>>> compressed content of up to ~5 KiB. This thinking might be missing
>>> something, though.
>>>
>>> If there's a better/cleaner way to deal with this, I would love to know.
>>>
>>> Cheers,
>>>
>>>
>>> [1]: http://www.zlib.net/zlib_tech.html
>>>
>>>
>>>
>>> On 24-09-2014 03:55, Park, Sungjin wrote:
>>>
>>> Hi, I'm about to report a problem with erlang's zlib library interface
>>> which I think is a design flaw at this point of time.
>>>
>>>  We recently had some malicious packets which were not very big in the
>>> first place but inflated to really big ones - hundreds of megabytes each.
>>> As a result, the server crashed with out-of-memory by the processes calling
>>> zlib:inflate/2.  Urgency forced us to make a custom NIF library with
>>> inflation size limit.  We also studied erlang reference manual but couldn't
>>> find anything useful.  The zlib library source code shows even
>>> zlib:setBufSize/2 does not prevent producing very big binaries.
>>>
>>>  Not being able to know how big the data would become after inflation,
>>> it should be a quite common problem.  So I'm curious if I missed something
>>> very simple and nice.  Is there anything like that?
>>>
>>>  --
>>> Park, Sungjin
>>>
>>> -------------------------------------------------------------------------------------------------------------------
>>> Peculiar travel suggestions are dancing lessons from god.
>>>   -- The Books of Bokonon
>>>
>>> -------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing listerlang-questions@REDACTED://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>> --
>>> Guilherme
>>> https://www.gandrade.net/
>>> PGP: 0x35CB8191 / 1968 5252 3901 B40F ED8A  D67A 9330 79B1 35CB 8191
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>>
>> --
>> Christopher Vance
>>
>
>
>
> --
> Park, Sungjin
>
> -------------------------------------------------------------------------------------------------------------------
> Peculiar travel suggestions are dancing lessons from god.
>   -- The Books of Bokonon
>
> -------------------------------------------------------------------------------------------------------------------
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>

-- 
Christopher Vance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140924/dd0483ff/attachment.htm>