[erlang-bugs] Bug in xmerl

Tue Jul 1 09:13:34 CEST 2008

Is it possible for someone from the OTP team to confirm if this is a bug or
not?

If it is I could really use a patch :-)

- Mikkel

On Fri, Jun 27, 2008 at 2:57 PM, Mikkel Jensen <mj@REDACTED> wrote:

> It seems there is a bug in xmerl when loading elements that contain numeric
> character references followed by UTF-8 characters.
>
> Example: é newline é
>
> 1> element(1, xmerl_scan:string("<a>\303\251&#xD;\303\251</a>", [{encoding,
> 'utf-8'}])).
> {xmlElement,a,a,[],
>             {xmlNamespace,[],[]},
>             [],1,[],
>             [{xmlText,[{a,1}],1,[],"\303\251",text},
>              {xmlText,[{a,1}],2,[],[10,195,131,194,169],text}],
>             [],"/",undeclared}
>
> Xmerl splits the parsed value around the newline character (strange but
> ok). However, the first part is encoded correctly while the second part is
> garbled!
>
> It's worth noticing that attribute values are encoded correctly:
>
> 2> element(1, xmerl_scan:string("<a b=\"\303\251&#xD;\303\251\"/>",
> [{encoding, 'utf-8'}])).
> {xmlElement,a,a,[],
>             {xmlNamespace,[],[]},
>             [],1,
>             [{xmlAttribute,b,[],[],[],[],1,[],"\303\251 \303\251",false}],
>             [],[],"/",undeclared}
>
> - Mikkel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20080701/6123c921/attachment.htm>