[erlang-bugs] xmerl problem

Thu Feb 19 10:00:05 CET 2009

Hi,

you are right about this. Xmerl transforms all parsed code to unicode 
code points. This should have been clearly stated in the documentation 
and I will fix it in the next release.
For ascii or latin1 this is no problem, but you will get an invalid 
IOlist() if you have a character sets with unicode code points bigger 
than 255.

/Bertil

Xingdong Bian wrote:
> Hi all,
>
> There is an issue with xmerl, or at least the xmerl documentation. The
> record definition in xmerl.hrl claims that the 'value' element of the
> xmlText record is an IOlist() (which is defined as IOlist = [char() |
> binary() | IOlist]). However, when parsing a character reference [1] the
> unicode code point is included in the list, even if it is not a valid
> char(). It makes sense to return the unicode code point rather than
> guessing which caracter set the application wants it translated to, but
> in this case the documentation should state this. Another approach would
> be to translate it to the character set of the XML document, but I'm not
> sure that's very useful, and would require some code to be written for
> xmerl.
>
> The attached xml document should cause an invalid IOlist() to be
> returned in the xmlText record.
>
> [1] http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-references
>
> Record definition from xmerl.hrl:
> %% plain text
> %% IOlist = [char() | binary () | IOlist]
> -record(xmlText,{
>       parents = [], % [{atom(),integer()}]
>       pos,      % integer()
>       language = [],% inherits the element's language
>       value,    % IOlist()
>       type = text   % atom() one of (text|cdata)
>      }).
>
> Thanks
> Xingdong
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs