[erlang-bugs] Bug in xmerl
Mikkel Jensen
mj@REDACTED
Tue Jul 1 09:13:34 CEST 2008
Is it possible for someone from the OTP team to confirm if this is a bug or
not?
If it is I could really use a patch :-)
- Mikkel
On Fri, Jun 27, 2008 at 2:57 PM, Mikkel Jensen <mj@REDACTED> wrote:
> It seems there is a bug in xmerl when loading elements that contain numeric
> character references followed by UTF-8 characters.
>
> Example: é newline é
>
> 1> element(1, xmerl_scan:string("<a>\303\251
\303\251</a>", [{encoding,
> 'utf-8'}])).
> {xmlElement,a,a,[],
> {xmlNamespace,[],[]},
> [],1,[],
> [{xmlText,[{a,1}],1,[],"\303\251",text},
> {xmlText,[{a,1}],2,[],[10,195,131,194,169],text}],
> [],"/",undeclared}
>
> Xmerl splits the parsed value around the newline character (strange but
> ok). However, the first part is encoded correctly while the second part is
> garbled!
>
> It's worth noticing that attribute values are encoded correctly:
>
> 2> element(1, xmerl_scan:string("<a b=\"\303\251
\303\251\"/>",
> [{encoding, 'utf-8'}])).
> {xmlElement,a,a,[],
> {xmlNamespace,[],[]},
> [],1,
> [{xmlAttribute,b,[],[],[],[],1,[],"\303\251 \303\251",false}],
> [],[],"/",undeclared}
>
> - Mikkel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20080701/6123c921/attachment.htm>
More information about the erlang-bugs
mailing list