[erlang-questions] Handling UTF-8 data when parsing XML using xmerl

Seth Falcon seth@REDACTED
Fri Sep 4 16:11:23 CEST 2009


Hi Silvester,

* On 2009-09-04 at 11:33 +0200 Roessner, Silvester wrote:
> I had also problems with unicode support in xmerl.
> 
> My solution is to convert the list containing unicode code-points
> (which I get in my case from .NET)
> into a UTF-8 string xmerl can handle.
> 
> fix_unicode(XmlString) ->
> 	Binary = unicode:characters_to_binary(XmlString, unicode),
> 	binary_to_list(Binary).

That looks quite similar to the workaround included in my original
post.  The need for such a transformation lacks some elegance IMO --
especially in the context of the original post where the string is
coming from data that xmerl has already parsed once.

So my question is really whether that's how things are "supposed to
be" or whether there is a different and cleaner approach or a way to
fix xmerl to handle this better.

But I'm glad to know that someone else has seen a similar issue and
come to a similar work around.

Thanks,

+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user


More information about the erlang-questions mailing list