[erlang-bugs] xmerl: unicode in attribute value generates extra character
Magnus Mueller
magnus.mueller@REDACTED
Thu Oct 31 12:38:30 CET 2013
Hello,
When parsing the following simple XML string, xmerl doesn't handle the Unicode attribute value correctly:
xmerl_scan:string("<?xml version=\"1.0\"?><hello attribute=\"°\"></hello>").
{{xmlElement,hello,hello,[],
{xmlNamespace,[],[]},
[],1,
[{xmlAttribute,attribute,[],[],[],
[{hello,1}],
1,[],"°Â",false}],
[],[],
"/home/mmueller/work/entelios/NOC-Config/Erlang/parsexml",
undeclared},
[]}
The resulting attribute value should be a single character. [1] mentions that xmerl can parse Unicode, provided that attribute names can be mapped to ASCII.
The result is the same when the xml encoding is specified explicitly:
xmerl_scan:string("<?xml version=\"1.0\" encoding=\"utf-8\" ?><hello attribute=\"°\"></hello>").
{{xmlElement,hello,hello,[],
{xmlNamespace,[],[]},
[],1,
[{xmlAttribute,attribute,[],[],[],
[{hello,1}],
1,[],"°Â",false}],
[],[],
"/home/mmueller/work/entelios/NOC-Config/Erlang/parsexml",
undeclared},
[]}
Apparently, "°Â" is the list of UTF8 bytes in _reversed order_.
With kind regards,
Magnus Müller
[1] http://www.erlang.org/doc/apps/xmerl/xmerl_ug.html#id61231
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20131031/c4e544a1/attachment.htm>
More information about the erlang-bugs
mailing list