[erlang-questions] Xmerl questions

Nicolas Favre-Félix <>
Sun Jul 29 01:28:17 CEST 2007


I have encountered a few problems with xmerl and have a some questions:

* How is it possible to find out which encoding the file is written
with? As far as I can see, the <?xml version="1.0" encoding="utf-8" ?>
tag is not present in the xmlDocument record. So far I'm using a regular
expression, but this is a quick hack and I would prefer a cleaner solution.

* I am using RSS feeds and have encoding problems with some of them.
For example, the current feeds at
http://linuxfr.org/backend/journaux/rss20.rss and
http://www.lewistrondheim.com/blog/rss/fil_rss.xml trigger an exit when
given to xmerl_scan:file/1 or string/1: ** exited:
{bad_character_code,"<the xml data is copied here>"}. I tested both
these files using xmlproc_parse, which reported no XML error. What is
the problem here?

* Why are text elements sometimes split in several xmlElement records?
For example, one XML file was returned as an xmlDocument record
containing the following:
{xmlText,[{content,12},{entry,24},{feed,1}],73,[],"> : ",text},
{xmlText,[{content,12},{entry,24},{feed,1}],74,[],"<a hreflang=",text}
Since there is nothing between these three xmlText elements, why are
they not part of the same xmlText?

* There seem to be some encoding/decoding functions in xmerl_ucs.erl,
but not for all encodings (only a few functions are exported). Can these
functions be used to convert strings from one encoding to another,
safely? If yes, what is the point of the "iconv" port in Jungerl? What
is the recommended library to convert encodings in Erlang?

* Would xml_lt be a possible alternative? (
http://www.erlang.org/user.html#xml_lt-2.0 ). It is a bit of a problem
for me to add it as a dependency since it is not packaged with the
erlang libraries at the moment.

Thanks in advance,


More information about the erlang-questions mailing list