[erlang-questions] xmerl utf-8 encoding

Mikkel Jensen mj@REDACTED
Fri Feb 13 12:17:17 CET 2009


I ran into the same problem in the latest release.
I found out that if you tell the parser what encoding to expect like this:

{Xml, _} = xmerl_scan:string(XmlString, [{encoding, "iso-10646-utf-1"}]),

it will handle UTF-8 correctly. Of course it will only work if you know the
encoding in advance. A better solution will be for the parser to understand
the correct header and also default to UTF-8 like in the previous versions.

- Mikkel

On Fri, Feb 13, 2009 at 11:38 AM, Michal Ptaszek <
michal.ptaszek@REDACTED> wrote:

> Hi All,
>
> After the migration from R12B4 to R12B5 (xmerl version changed from 1.1.9
> to 1.1.10)
> I have noticed something probably unwanted.
>
> During the document processing phase, the wfc_Legal_Character fatal error
> is thrown even
> if I use the proper header (<?xml version="1.0" encoding="utf-8"?>).
>
> The previous version of xmerl was dealing with UTF-8 encoded characters
> flawlessly,
> the newest one unfortunately does not want to cooperate.
>
> Is it a xmerl bug/intended feature/my xmerl misunderstanding (if so, how to
> parse document
> containing UTF-8 encoded characters correctly)?
>
> Best regards,
> --
> Michal Ptaszek
> www.erlang-consulting.com
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090213/d304f560/attachment.htm>


More information about the erlang-questions mailing list