[bug & patch] xmerl_scan doesn't decode &#x refs properly
Paul Guyot
pguyot@REDACTED
Mon Jun 7 18:17:47 CEST 2010
Hello,
There is a bug in xmerl_scan. It doesn't decode &#x refs properly.
Considering the following code :
{UTF8Output, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>" ++ [229, 145, 156] ++ "</text>"),
#xmlElement{content = [#xmlText{value = UTF8Text}]} = UTF8Output,
{DecEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
#xmlElement{content = [#xmlText{value = DecEntityText}]} = DecEntityOutput,
{HexEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
#xmlElement{content = [#xmlText{value = HexEntityText}]} = HexEntityOutput,
UTF8Text and DecEntityText are equal and as expected ([16#545C]).
HexEntityText is (incorrectly) a list composed of the three UTF8 bytes [229, 145, 156] while it should be equal to [16#545C].
A patch with a test case can be found here:
git fetch git://github.com/pguyot/otp.git pg/xmerl_scan_hex_entities
Regards,
Paul
--
Semiocast http://semiocast.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris
More information about the erlang-bugs
mailing list