<div>The following short patch fixes a bug in xmerl that causes character references in attribute values to be normalized incorrectly:</div><div><br></div> git fetch <a href="https://github.com/tmoertel/otp.git">https://github.com/tmoertel/otp.git</a> xmerl_attr_charref_fix<div>
<br></div><div>Explanation:</div><div><br></div><div><div>Section 3.3.3 of the XML Recommendation gives the rules for</div><div>attribute-value normalization. One of those rules requires</div><div>that character references not be re-normalized after being</div>
<div>replaced with the referenced characters:</div><div><br></div><div> For a character reference, append the referenced</div><div> character to the normalized value.</div><div><br></div><div>And, in particular:</div>
<div><br></div><div> Note that if the unnormalized attribute value contains</div><div> a character reference to a white space character other</div><div> than space (#x20), the normalized value contains the</div><div>
referenced character itself (#xD, #xA or #x9).</div><div><br></div><div> Source: <a href="http://www.w3.org/TR/xml/#AVNormalize">http://www.w3.org/TR/xml/#AVNormalize</a></div><div><br></div><div>In xmerl_scan, however, character references in attributes are</div>
<div>normalized again after replacement. For example, the</div><div>character reference "
" in the following XML document gets</div><div>normalized (incorrectly) into a space when parsed:</div><div><br></div>
<div> 2> xmerl_scan:string("<root x='
'/>").</div><div> {... [{xmlAttribute,x,[],[],[],[],1,[]," ",false}] ...}</div><div><br></div><div>This short patch restores the correct behavior:</div>
<div><br></div><div> 2> xmerl_scan:string("<root x='
'/>").</div><div> {... [{xmlAttribute,x,[],[],[],[],1,[],"\n",false}] ...}</div><div><br></div><div>NOTE: This change does not include tests because I could not</div>
<div>find a test suite for xmerl.</div></div><div><br></div><div><br></div><div><br></div><div>Cheers,</div><div>Tom</div><div><br></div>