<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 22-11-2011 13:11, eurekafag wrote:
<blockquote
cite="mid:CALpRnidEUS82pweZmoQTypqL0AbZEE08MrKy59siBz1=BCkGEA@mail.gmail.com"
type="cite">Many thanks for this thorough research! However I have
two things to mention. Setting or getting encoding introduces
noticeable delay in launching without -noinput, but with it it
starts just as fast as usual. Pretty strange.</blockquote>
Yes, I noticed that too; the delay is so long that there is probably
a timeout somewhere.<br>
<br>
<blockquote
cite="mid:CALpRnidEUS82pweZmoQTypqL0AbZEE08MrKy59siBz1=BCkGEA@mail.gmail.com"
type="cite"> And another a bit illogical issue: to print UTF-8
strings one should NOT set binary type /utf8. This works fine with
encoding set: io:format("~ts~n", [<<"Тестовая
строка">>]).
<div>
This fails in both noinput-cases with encoding
set: io:format("~ts~n", [<<"Тестовая
строка"/utf8>>]).</div>
</blockquote>
Remember that still, *source files are always interpreted as
latin-1*.<br>
<br>
From <a class="moz-txt-link-freetext" href="http://www.erlang.org/doc/apps/stdlib/unicode_usage.html">http://www.erlang.org/doc/apps/stdlib/unicode_usage.html</a> :<br>
<blockquote>It is convenient to be able to write a list of Unicode
characters in the string syntax. However, the language specifies
strings as being in the ISO-latin-1 character set which the
compiler tool chain as well as many other tools expect.<br>
<br>
Also the source code is (for now) still expected to be written
using the ISO-latin-1 character set, why Unicode characters beyond
that range cannot be entered in string literals.<br>
</blockquote>
Which means that the "/utf8" modifier will always do a
latin1->utf8 encoding.<br>
So, yes, if you ensure that your source files are UTF-8 encoded, you
can use the string literals as they are, and expect them to be
UTF-8.<br>
<br>
<blockquote
cite="mid:CALpRnidEUS82pweZmoQTypqL0AbZEE08MrKy59siBz1=BCkGEA@mail.gmail.com"
type="cite">
<div>I guess it's because of double encoding (by explicitly
defined encoding and that suffix) but I was confused at first.
It's better not to set encoding but declare it in binary strings
like they do in Python prepending strings with 'u' literal,
which doesn't work in Erlang for all cases.</div>
</blockquote>
Well, for the u"..." syntax, Python also needs to know the encoding
of the source file. Unlike Erlang, however, Python can be told what
the encoding is (and can recognize Unicode files which begin with a
BOM character).<br>
<br>
/Erik<br>
</body>
</html>