<div dir="ltr">I'm not sure whether to call you Park, or Sungjin, or something else, but Hello in any case.<div><br></div><div>The simplest way I can think of to do (1) is to read all the data, doing the decompression as you go, but throwing away the actual decompressed data and everything else not needed in calculating the decompressed size. This is only useful if you can then decide to go back to the beginning of the data and read it all over again if you decide to keep the result of the decompression after all. (This is fine for reading a file which will still be there after you work out the decompressed size, but may require writing all data to a temporary file if you're reading data from a network stream.)</div><div><br></div><div>The simplest way I can think of to do (2) is to read all the data, keeping what you decompress until you discover the next part won't fit. You then have to stop, return the data you have so far (probably less than the full size permitted) and you have to be able to resume at the beginning of the part that didn't fit, keeping any compression/decompression state necessary to continue.</div><div><br></div><div>The simplest way I can think of to do (3) is to start doing (2) and abort the procedure altogether if the next part won't fit.</div><div><br></div><div>My mention of 1:1 compression was intended to be facetious - a joke. My apologies if this wasn't clear. The easiest method I can think of is to return the input stream unaltered, so the "compressed" data is the same size as the original. There are other methods which are also trivially reversed with no risk of buffer overflow. If you haven't saved any space it's not actually useful, and it's not really compression, but you may want to be able to specify such an algorithm when chaining functions together.</div><div><br></div><div>The proportion of compression you'll get from an algorithm will depend on the nature and variability of the content, and on the way the algorithm adapts to that variability. If you know the data is all zero values, a good compression algorithm is merely to report the length of the data; on the other hand, a stream of random (or pseudo-random) bits is theoretically uncompressable.</div><div><br></div><div>So if mixing compression and encryption, you can usually save space by compressing before encrypting, and decrypting before decompressing, but the other way around will almost certainly increase your size because the compression algorithm will give a larger output than the input. If you know the data is sufficiently random, you may choose not to attempt compression at all.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 24, 2014 at 3:03 PM, Park, Sungjin <span dir="ltr"><<a href="mailto:jinni.park@gmail.com" target="_blank">jinni.park@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Christopher,<div><br><div>I don't have protocol level knowledge about zlib. But in my opinion - in programmer's perspective, one of the following should be possible.</div><div><br></div><div>1) an API to get inflated data size before actually inflating the data</div><div>2) an API to get part of the inflated data by given size and let the aggregation be done by the user</div><div>3) an API option to set limit in the size of the inflated data</div><div><br></div><div>Currently, we are doing 3) with a custom NIF. Better of course if I can do any of the above with standard library.</div><div><br></div><div>I don't understand about your 1:1 compression scheme suggestion. AFAIK, compression is used to reduce number of bytes on flight after all, isn't it?</div><div><br></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 24, 2014 at 12:41 PM, Christopher Vance <span dir="ltr"><<a href="mailto:cjsvance@gmail.com" target="_blank">cjsvance@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The zlib protocol is fully specified, and it appears the code is working correctly, so the only issue you might have is deciding what to do when you see input which could be seen as malicious (but which is still properly defined). Do you want to abort handling the input stream, or do you have an alternate suggestion?<div><br></div><div>If you don't like the way zlib does it, or prefer a compression scheme which is more predictable, why not try a different compression algorithm, and see if it does any better in these pathological cases. I can specify you several 1:1 "compression" schemes where you only ever get out the same number of bytes you put in, but you'd probably think these safe method are too boring.</div></div><div class="gmail_extra"><div><div><br><div class="gmail_quote">On Wed, Sep 24, 2014 at 1:15 PM, Guilherme Andrade <span dir="ltr"><<a href="mailto:g@gandrade.net" target="_blank">g@gandrade.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi Sungjin,<br>
<br>
I've recently dealt this with very same issue, albeit only as a
security hardening and a prevention measure; and, like you, I've
looked into the zlib C code that's bundled with erts and arrived at
that same conclusion.<br>
<br>
I was only able to limit it on a theoretical basis: the zlib guys
themselves state[1] that a maximum compression ratio of 1032:1 is
achievable (with big blobs of zeroes.) Therefore, if I want to limit
the uncompressed content to less than, let's say, 5 MiB, I'll only
accept compressed content of up to ~5 KiB. This thinking might be
missing something, though.<br>
<br>
If there's a better/cleaner way to deal with this, I would love to
know.<br>
<br>
Cheers,<br>
<br>
<br>
[1]: <a href="http://www.zlib.net/zlib_tech.html" target="_blank">http://www.zlib.net/zlib_tech.html</a><div><div><br>
<br>
<br>
<div>On 24-09-2014 03:55, Park, Sungjin
wrote:<br>
</div>
</div></div><blockquote type="cite"><div><div>
<div dir="ltr">Hi, I'm about to report a problem with erlang's
zlib library interface which I think is a design flaw at this
point of time.
<div><br>
</div>
<div>We recently had some malicious packets which were not very
big in the first place but inflated to really big ones -
hundreds of megabytes each. As a result, the server crashed
with out-of-memory by the processes calling zlib:inflate/2.
Urgency forced us to make a custom NIF library with inflation
size limit. We also studied erlang reference manual but
couldn't find anything useful. The zlib library source code
shows even zlib:setBufSize/2 does not prevent producing very
big binaries.</div>
<div><br>
</div>
<div>Not being able to know how big the data would become after
inflation, it should be a quite common problem. So I'm
curious if I missed something very simple and nice. Is there
anything like that?</div>
<div>
<div><br>
</div>
-- <br>
Park, Sungjin
<div>-------------------------------------------------------------------------------------------------------------------</div>
<div>Peculiar travel suggestions are dancing lessons from god.</div>
<div> -- The Books of Bokonon</div>
<div>-------------------------------------------------------------------------------------------------------------------</div>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div></div><pre>_______________________________________________
erlang-questions mailing list
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><span><font color="#888888">
</font></span></pre><span><font color="#888888">
</font></span></blockquote><span><font color="#888888">
<br>
<pre cols="72">--
Guilherme
<a href="https://www.gandrade.net/" target="_blank">https://www.gandrade.net/</a>
PGP: 0x35CB8191 / 1968 5252 3901 B40F ED8A D67A 9330 79B1 35CB 8191
</pre>
</font></span></div>
<br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br></div></div><span><font color="#888888">Christopher Vance
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Park, Sungjin<div>-------------------------------------------------------------------------------------------------------------------</div><div>Peculiar travel suggestions are dancing lessons from god.</div><div> -- The Books of Bokonon</div><div>-------------------------------------------------------------------------------------------------------------------</div>
</div>
</div></div><br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>Christopher Vance
</div>