[erlang-bugs] Bug in xmerl

Mikkel Jensen mj@REDACTED
Fri Jun 27 14:57:26 CEST 2008


It seems there is a bug in xmerl when loading elements that contain numeric
character references followed by UTF-8 characters.

Example: é newline é

1> element(1, xmerl_scan:string("<a>\303\251&#xD;\303\251</a>", [{encoding,
'utf-8'}])).
{xmlElement,a,a,[],
            {xmlNamespace,[],[]},
            [],1,[],
            [{xmlText,[{a,1}],1,[],"\303\251",text},
             {xmlText,[{a,1}],2,[],[10,195,131,194,169],text}],
            [],"/",undeclared}

Xmerl splits the parsed value around the newline character (strange but ok).
However, the first part is encoded correctly while the second part is
garbled!

It's worth noticing that attribute values are encoded correctly:

2> element(1, xmerl_scan:string("<a b=\"\303\251&#xD;\303\251\"/>",
[{encoding, 'utf-8'}])).
{xmlElement,a,a,[],
            {xmlNamespace,[],[]},
            [],1,
            [{xmlAttribute,b,[],[],[],[],1,[],"\303\251 \303\251",false}],
            [],[],"/",undeclared}

- Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20080627/dc6c2af0/attachment.htm>


More information about the erlang-bugs mailing list