[erlang-bugs] xmerl problem
Bertil Karlsson
bertil.karlsson@REDACTED
Thu Feb 19 10:00:05 CET 2009
Hi,
you are right about this. Xmerl transforms all parsed code to unicode
code points. This should have been clearly stated in the documentation
and I will fix it in the next release.
For ascii or latin1 this is no problem, but you will get an invalid
IOlist() if you have a character sets with unicode code points bigger
than 255.
/Bertil
Xingdong Bian wrote:
> Hi all,
>
> There is an issue with xmerl, or at least the xmerl documentation. The
> record definition in xmerl.hrl claims that the 'value' element of the
> xmlText record is an IOlist() (which is defined as IOlist = [char() |
> binary() | IOlist]). However, when parsing a character reference [1] the
> unicode code point is included in the list, even if it is not a valid
> char(). It makes sense to return the unicode code point rather than
> guessing which caracter set the application wants it translated to, but
> in this case the documentation should state this. Another approach would
> be to translate it to the character set of the XML document, but I'm not
> sure that's very useful, and would require some code to be written for
> xmerl.
>
> The attached xml document should cause an invalid IOlist() to be
> returned in the xmlText record.
>
> [1] http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-references
>
> Record definition from xmerl.hrl:
> %% plain text
> %% IOlist = [char() | binary () | IOlist]
> -record(xmlText,{
> parents = [], % [{atom(),integer()}]
> pos, % integer()
> language = [],% inherits the element's language
> value, % IOlist()
> type = text % atom() one of (text|cdata)
> }).
>
> Thanks
> Xingdong
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
More information about the erlang-bugs
mailing list