[erlang-questions] UTF8

Dustin Whitney dustin.whitney@REDACTED
Thu Feb 14 17:29:45 CET 2008


On Thu, Feb 14, 2008 at 10:40 AM, Kevin Scaldeferri <kevin@REDACTED>
wrote:

>
> On Feb 14, 2008, at 2:42 AM, Hasan Veldstra wrote:
>
> >
> >> And how do you suppose my data was turned into UTF-32.  I got the
> >> data originally from an HTTP GET request that returned a UTF-8
> >> encoded XML file.  The file its self says it's UTF-8, and the
> >> header tuple said the document was using the UTF-8 charset.  Does
> >> Erlang convert the data automatically?
> >
> > Actually, 332 (U+14C) is also the UTF-16 encoding for "Ō". If I
> > remember correctly, the XML standard requires that the documents be
> > in UTF-8 or UTF-16. So it's most likely that your XML file is encoded
> > in UTF-16, and the headers are wrong.
>
> The XML standard says that the default encoding is UTF-8, but you can
> specify any encoding you want.
>
> -kevin



Right, 332 not being able to be converted to a byte makes sense to me now,
and I suppose converting it to UTF-8 resulting in 197, 140 makes sense now
too (it will make more sense when I read further about what UTF-8 does for
characters outside of the ASCII set).  But yeah, I'm still not sure why the
document that says it was encoded in UTF-8 ended up being in UTF-16 (or
UTF-32).

Anyway, I really appreciate the help.  I've never really had to think about
what was happening behind the scenes with my strings, but now I've got a
pretty good understanding of how erlang deals with them.

-Dustin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080214/28678369/attachment.htm>


More information about the erlang-questions mailing list