[erlang-bugs] xmerl: unicode in attribute value generates extra character

Magnus Mueller magnus.mueller@REDACTED
Thu Oct 31 12:38:30 CET 2013


Hello,

When parsing the following simple XML string, xmerl doesn't handle the Unicode attribute value correctly:

xmerl_scan:string("<?xml version=\"1.0\"?><hello attribute=\"&#xb0;\"></hello>").
{{xmlElement,hello,hello,[],
             {xmlNamespace,[],[]},
             [],1,
             [{xmlAttribute,attribute,[],[],[],
                            [{hello,1}],
                            1,[],"°Â",false}],
             [],[],
             "/home/mmueller/work/entelios/NOC-Config/Erlang/parsexml",
             undeclared},
[]}

The resulting attribute value should be a single character. [1] mentions that xmerl can parse Unicode, provided that attribute names can be mapped to ASCII.

The result is the same when the xml encoding is specified explicitly:

xmerl_scan:string("<?xml version=\"1.0\" encoding=\"utf-8\" ?><hello attribute=\"&#xb0;\"></hello>").
{{xmlElement,hello,hello,[],
             {xmlNamespace,[],[]},
             [],1,
             [{xmlAttribute,attribute,[],[],[],
                            [{hello,1}],
                            1,[],"°Â",false}],
             [],[],
             "/home/mmueller/work/entelios/NOC-Config/Erlang/parsexml",
             undeclared},
[]}

Apparently, "°Â" is the list of UTF8 bytes in _reversed order_.

With kind regards,
Magnus Müller

[1] http://www.erlang.org/doc/apps/xmerl/xmerl_ug.html#id61231

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20131031/c4e544a1/attachment.htm>


More information about the erlang-bugs mailing list