[erlang-patches] Patch to xmerl_scan to fix character-reference normalization in attribute values
Tom Moertel
tom@REDACTED
Thu Apr 28 23:24:46 CEST 2011
The following short patch fixes a bug in xmerl that causes character
references in attribute values to be normalized incorrectly:
git fetch https://github.com/tmoertel/otp.git xmerl_attr_charref_fix
Explanation:
Section 3.3.3 of the XML Recommendation gives the rules for
attribute-value normalization. One of those rules requires
that character references not be re-normalized after being
replaced with the referenced characters:
For a character reference, append the referenced
character to the normalized value.
And, in particular:
Note that if the unnormalized attribute value contains
a character reference to a white space character other
than space (#x20), the normalized value contains the
referenced character itself (#xD, #xA or #x9).
Source: http://www.w3.org/TR/xml/#AVNormalize
In xmerl_scan, however, character references in attributes are
normalized again after replacement. For example, the
character reference "
" in the following XML document gets
normalized (incorrectly) into a space when parsed:
2> xmerl_scan:string("<root x='
'/>").
{... [{xmlAttribute,x,[],[],[],[],1,[]," ",false}] ...}
This short patch restores the correct behavior:
2> xmerl_scan:string("<root x='
'/>").
{... [{xmlAttribute,x,[],[],[],[],1,[],"\n",false}] ...}
NOTE: This change does not include tests because I could not
find a test suite for xmerl.
Cheers,
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20110428/0fa1afa4/attachment.htm>
More information about the erlang-patches
mailing list