Thu Feb 14 17:29:45 CET 2008
On Thu, Feb 14, 2008 at 10:40 AM, Kevin Scaldeferri <>
> On Feb 14, 2008, at 2:42 AM, Hasan Veldstra wrote:
> >> And how do you suppose my data was turned into UTF-32. I got the
> >> data originally from an HTTP GET request that returned a UTF-8
> >> encoded XML file. The file its self says it's UTF-8, and the
> >> header tuple said the document was using the UTF-8 charset. Does
> >> Erlang convert the data automatically?
> > Actually, 332 (U+14C) is also the UTF-16 encoding for "Ō". If I
> > remember correctly, the XML standard requires that the documents be
> > in UTF-8 or UTF-16. So it's most likely that your XML file is encoded
> > in UTF-16, and the headers are wrong.
> The XML standard says that the default encoding is UTF-8, but you can
> specify any encoding you want.
Right, 332 not being able to be converted to a byte makes sense to me now,
and I suppose converting it to UTF-8 resulting in 197, 140 makes sense now
too (it will make more sense when I read further about what UTF-8 does for
characters outside of the ASCII set). But yeah, I'm still not sure why the
document that says it was encoded in UTF-8 ended up being in UTF-16 (or
Anyway, I really appreciate the help. I've never really had to think about
what was happening behind the scenes with my strings, but now I've got a
pretty good understanding of how erlang deals with them.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions