[erlang-questions] Fail to parse utf-8 encoded XML

Daniel Abrahamsson daniel.abrahamsson@REDACTED
Fri Mar 27 12:30:51 CET 2015


Thanks Anthony!

I see, the error is using characters_to_list in this case, as xmerl wants
the "raw" code units, not unicode code points.

//Daniel

On Fri, Mar 27, 2015 at 12:04 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:

> Le 27 mars 2015 à 10:52, Daniel Abrahamsson <daniel.abrahamsson@REDACTED>
> a écrit :
>
> > Is this a bug in xmerl or am I missing something obvious?
>
> You are missing an inconspicuous thing.
>
> xmerl_scan:string/1 takes a list of code units. Try:
>
>         xmerl_scan:string(binary_to_list(<<"<?xml version=\"1.0\"
> encoding=\"UTF-8\"?><root>ümlaut</root>"/utf8>>)).
>
> Regards.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150327/e5ed647c/attachment.htm>


More information about the erlang-questions mailing list