[erlang-patches] [bug & patch] xmerl_scan doesn't decode &#x refs properly

Raimo Niskanen <>
Tue Jun 8 09:58:51 CEST 2010


On Mon, Jun 07, 2010 at 06:17:47PM +0200, Paul Guyot wrote:
> Hello,
> 
> There is a bug in xmerl_scan. It doesn't decode &#x refs properly.
> 
> Considering the following code :
> 
> {UTF8Output, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>" ++ [229, 145, 156] ++ "</text>"),
> #xmlElement{content = [#xmlText{value = UTF8Text}]} = UTF8Output,
> {DecEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
> #xmlElement{content = [#xmlText{value = DecEntityText}]} = DecEntityOutput,
> {HexEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>&#x545C;</text>"),
> #xmlElement{content = [#xmlText{value = HexEntityText}]} = HexEntityOutput,
> 
> UTF8Text and DecEntityText are equal and as expected ([16#545C]).
> HexEntityText is (incorrectly) a list composed of the three UTF8 bytes [229, 145, 156] while it should be equal to [16#545C].
> 
> A patch with a test case can be found here:
> 
> git fetch git://github.com/pguyot/otp.git pg/xmerl_scan_hex_entities

Thank you! It will be included in 'pu', after reformatting the commit
message and cherry-pick onto 'dev' since it was not based on 'dev'
but on a merge result containing 'dev'.

> 
> Regards,
> 
> Paul
> -- 
> Semiocast                    http://semiocast.com/
> +33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris
> 
> 
> ________________________________________________________________
> erlang-patches (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:
> 

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


More information about the erlang-patches mailing list