It seems there is a bug in xmerl when loading elements that contain numeric character references followed by UTF-8 characters.<br><br>Example: é newline é<br><br>1> element(1, xmerl_scan:string("<a>\303\251
\303\251</a>", [{encoding, 'utf-8'}])).<br>
{xmlElement,a,a,[],<br> {xmlNamespace,[],[]},<br> [],1,[],<br> [{xmlText,[{a,1}],1,[],"\303\251",text},<br> {xmlText,[{a,1}],2,[],[10,195,131,194,169],text}],<br> [],"/",undeclared}<br>
<br>Xmerl splits the parsed value around the newline character (strange but ok). However, the first part is encoded correctly while the second part is garbled!<br><br>It's worth noticing that attribute values are encoded correctly:<br>
<br>2> element(1, xmerl_scan:string("<a b=\"\303\251
\303\251\"/>", [{encoding, 'utf-8'}])).<br>{xmlElement,a,a,[],<br> {xmlNamespace,[],[]},<br> [],1,<br>
[{xmlAttribute,b,[],[],[],[],1,[],"\303\251 \303\251",false}],<br> [],[],"/",undeclared}<br><br>Can someone confirm if this is a bug?<br><br>- Mikkel<br>