[erlang-questions] zlib: how many bytes were used during the uncompression (one call of zlib:inflate/2)

Zabrane Mickael zabrane3@REDACTED
Fri Aug 3 14:45:31 CEST 2012


Hi JD,

> Is your question how much of GzipData was used to produce
> InflatedData? ie if GzipData is 1kb, how much of that 1kb was actually
> needed?

Exactly what I'm looking after.


> If so, isn't the idea of compression that every single byte in
> GzipData was needed to get the original?

Yes. Zlib uncompressor needs a buffer of input (compressed data) at a time to get the job done.
You don't need to feed it with all the compressed data at once. Chunk by chunk ... is ok.

Let me state my problem differently. 

One of the less know feature of GZIP is this one:
GZIP file + GZIP file + ... + GZIP file = GZIP file.

The concatenation of multiple GZIPPed files is a valid GZIP file that can be uncompresed with
the unix "gzip" command or by simply calling:
{ok, H} = file:read(ContatenatedGZIPs, [read, raw, [compressed]),
{ok, Data} = file:read(H, 1024).
[...]

The problem when reading this kind of concatenated GZIP is that you're no longer able to distinguish between
your files. You can only read them as one big stream of data (i.e one big file).

Let say I have 3 GZIP files concatenated in one:

[begin offset GZIP1 = 0]
GZIP1 (compressed size = 10 bytes)

[begin offset GZIP2 = 10]
GZIP2 (compressed size = 5 bytes)

[begin offset GZIP3 = 15]
GZIP3 (compressed size = 25 bytes)
[end offset of GZIP3 = 40]

I'm interested on GZIP2 file and need a reliable way to get 
where its starts from and where it finishs (the compressed offsets [10,15]).

Hope my problem is clear now.

Help appreciated guys!!!

Regards,
Zabrane


> 
> JD
> 
> On 2 August 2012 17:19, Zabrane Mickael <zabrane3@REDACTED> wrote:
>> Hi guys,
>> 
>> I'm playing a bit with the zlib module today.
>> 
>> Let say I wan to uncompress a GzipData binary with zlib:inflate/2:
>> InflatedData = zlib:inflate(Z, GzipData).
>> 
>> In case of success, I want to know how many bytes from GzipData were used
>> internally to get the InflatedData?
>> 
>> Regards,
>> Zabrane
>> 
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions





More information about the erlang-questions mailing list