[erlang-bugs] xmerl problem

Xingdong Bian <>
Wed Feb 18 19:18:47 CET 2009


Hi all,

There is an issue with xmerl, or at least the xmerl documentation. The
record definition in xmerl.hrl claims that the 'value' element of the
xmlText record is an IOlist() (which is defined as IOlist = [char() |
binary() | IOlist]). However, when parsing a character reference [1] the
unicode code point is included in the list, even if it is not a valid
char(). It makes sense to return the unicode code point rather than
guessing which caracter set the application wants it translated to, but
in this case the documentation should state this. Another approach would
be to translate it to the character set of the XML document, but I'm not
sure that's very useful, and would require some code to be written for
xmerl.

The attached xml document should cause an invalid IOlist() to be
returned in the xmlText record.

[1] http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-references

Record definition from xmerl.hrl:
%% plain text
%% IOlist = [char() | binary () | IOlist]
-record(xmlText,{
      parents = [], % [{atom(),integer()}]
      pos,      % integer()
      language = [],% inherits the element's language
      value,    % IOlist()
      type = text   % atom() one of (text|cdata)
     }).

Thanks
Xingdong
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.xml
Type: text/xml
Size: 58 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20090218/adf7a905/attachment.xml>


More information about the erlang-bugs mailing list