[erlang-questions] zlib problems in R14B03 when processing gzip encoded HTTP responses
Shaun Kruger
skruger@REDACTED
Thu Oct 27 16:40:03 CEST 2011
I found the solution to my problem and thought I would share.
It appears as if some web servers output gzip data that must be decoded one block at a time. Each block is conveniently transmitted as a chunk in the chunked transfer encoding.
In my code I was queuing up all the binary blocks trying zlib:gunzip() until it worked. With some web servers this would work once the last block was received. With others the gzip encoding would never decode even after collecting the full stream binary.
I ended up changing my code to initiate a zlib instance using zlib:open() and zlib:inflateInit(). When I receive each chunk I call zlib:inflate() and each individual chunk decompresses properly. I also noticed that zlib appears to be stateful. If I create a new zlib port (lazy and inefficient, I know) for every block then the later blocks in the transmission fail to decompress.
The old method of collecting all the data and calling one gunzip worked with some servers, the new method of inflating one block at a time seems to work with all servers.
Shaun
----- Original Message -----
> From: "Shaun Kruger" <skruger@REDACTED>
> To: erlang-questions@REDACTED
> Sent: Wednesday, October 26, 2011 8:38:23 AM
> Subject: [erlang-questions] zlib problems in R14B03 when processing gzip encoded HTTP responses
>
> I have an HTTP proxy server that I have added gzip support to. I am
> sending the header "Accept-Encoding: gzip", but what comes back
> doesn't always unzip with zlib. I learned early on that I need to
> receive the whole gzip block before I can unzip using zlib:gunzip()
> so I always ask the server for mode data each time I can't unzip the
> block. My problem at the moment is that there are some sites where
> I can never unzip the gzip encoded data while there are others where
> I can.
>
> I am not entirely sure from the documentation which of the zlib
> decompress functions I should be using. However, I have determined
> that when I have this problem none of them will work to decode the
> data. I am experiencing this problem only on certain sites. I have
> enabled logging so I am aware when gzip is working and when it is
> not. There are some sites where the main page gzip works, but other
> pages do not.
>
> I have called the three main unzip, uncompress, and gunzip functions
> in the zlib module and all of them fail the same way.
>
> I am wondering if anyone can suggest something else for me to try as
> I'm running out of leads. I have to admit that I don't know
> compression as well as I know other things like HTTP so I may just
> be missing something basic here.
>
> Read below to see my debugging output when I call the three zlib
> functions. I don't know for sure if it will help, but it should
> help identify if I'm making any basic mistakes.
>
> Shaun
>
> ?ERROR_MSG("unzip: ~p~n",[catch zlib:unzip(GZ)]),
> ?ERROR_MSG("uncompress: ~p~n",[catch zlib:uncompress(GZ)]),
> ?ERROR_MSG("gunzip: ~p~n",[catch zlib:gunzip(GZ)]),
>
> =ERROR REPORT==== 26-Oct-2011::08:25:08 ===
> unzip: {'EXIT',{data_error,[{zlib,call,3},
> {zlib,inflate,2},
> {zlib,unzip,1},
> {proxy_pass,handle_info,3},
> {gen_fsm,handle_msg,8},
> {proc_lib,init_p_do_apply,3}]}}
>
> =ERROR REPORT==== 26-Oct-2011::08:25:08 ===
> uncompress: {'EXIT',{data_error,[{zlib,call,3},
> {zlib,inflate,2},
> {zlib,uncompress,1},
> {proxy_pass,handle_info,3},
> {gen_fsm,handle_msg,8},
> {proc_lib,init_p_do_apply,3}]}}
>
> =ERROR REPORT==== 26-Oct-2011::08:25:08 ===
> gunzip: {'EXIT',{data_error,[{zlib,call,3},
> {zlib,gunzip,1},
> {proxy_pass,handle_info,3},
> {gen_fsm,handle_msg,8},
> {proc_lib,init_p_do_apply,3}]}}
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
More information about the erlang-questions
mailing list