On Thu, Feb 14, 2008 at 10:40 AM, Kevin Scaldeferri <<a href="mailto:kevin@scaldeferri.com">kevin@scaldeferri.com</a>> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div><div></div><div class="Wj3C7c"><br>

On Feb 14, 2008, at 2:42 AM, Hasan Veldstra wrote:<br>

<br>

><br>

>> And how do you suppose my data was turned into UTF-32.  I got the<br>

>> data originally from an HTTP GET request that returned a UTF-8<br>

>> encoded XML file.  The file its self says it's UTF-8, and the<br>

>> header tuple said the document was using the UTF-8 charset.  Does<br>

>> Erlang convert the data automatically?<br>

><br>

> Actually, 332 (U+14C) is also the UTF-16 encoding for "ش". If I<br>

> remember correctly, the XML standard requires that the documents be<br>

> in UTF-8 or UTF-16. So it's most likely that your XML file is encoded<br>

> in UTF-16, and the headers are wrong.<br>

<br>

</div></div>The XML standard says that the default encoding is UTF-8, but you can<br>

specify any encoding you want.<br>

<font color="#888888"><br>

-kevin</font></blockquote></div><br><br>Right, 332 not being able to be converted to a byte makes sense to me now, and I suppose converting it to UTF-8 resulting in 197, 140 makes sense now too (it will make more sense when I read further about what UTF-8 does for characters outside of the ASCII set).  But yeah, I'm still not sure why the document that says it was encoded in UTF-8 ended up being in UTF-16 (or UTF-32).  <br>

<br>Anyway, I really appreciate the help.  I've never really had to think about what was happening behind the scenes with my strings, but now I've got a pretty good understanding of how erlang deals with them.<br>

<br>-Dustin<br>