<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    On 22-11-2011 13:11, eurekafag wrote:

    <blockquote

cite="mid:CALpRnidEUS82pweZmoQTypqL0AbZEE08MrKy59siBz1=BCkGEA@mail.gmail.com"

      type="cite">Many thanks for this thorough research! However I have

      two things to mention. Setting or getting encoding introduces

      noticeable delay in launching without -noinput, but with it it

      starts just as fast as usual. Pretty strange.</blockquote>

    Yes, I noticed that too; the delay is so long that there is probably

    a timeout somewhere.<br>

    <br>

    <blockquote

cite="mid:CALpRnidEUS82pweZmoQTypqL0AbZEE08MrKy59siBz1=BCkGEA@mail.gmail.com"

      type="cite"> And another a bit illogical issue: to print UTF-8

      strings one should NOT set binary type /utf8. This works fine with

      encoding set:Â io:format("~ts~n", [<<"Ð¢ÐµÑÑ‚Ð¾Ð²Ð°Ñ

      ÑÑ‚Ñ€Ð¾ÐºÐ°">>]).

      <div>

        This fails in both noinput-cases with encoding

        set:Â io:format("~ts~n", [<<"Ð¢ÐµÑÑ‚Ð¾Ð²Ð°Ñ

        ÑÑ‚Ñ€Ð¾ÐºÐ°"/utf8>>]).</div>

    </blockquote>

    Remember that still, *source files are always interpreted as

    latin-1*.<br>

    <br>

    From <a class="moz-txt-link-freetext" href="http://www.erlang.org/doc/apps/stdlib/unicode_usage.html">http://www.erlang.org/doc/apps/stdlib/unicode_usage.html</a> :<br>

    <blockquote>It is convenient to be able to write a list of Unicode

      characters in the string syntax. However, the language specifies

      strings as being in the ISO-latin-1 character set which the

      compiler tool chain as well as many other tools expect.<br>

      <br>

      Also the source code is (for now) still expected to be written

      using the ISO-latin-1 character set, why Unicode characters beyond

      that range cannot be entered in string literals.<br>

    </blockquote>

    Which means that the "/utf8" modifier will always do a

    latin1->utf8 encoding.<br>

    So, yes, if you ensure that your source files are UTF-8 encoded, you

    can use the string literals as they are, and expect them to be

    UTF-8.<br>

    <br>

    <blockquote

cite="mid:CALpRnidEUS82pweZmoQTypqL0AbZEE08MrKy59siBz1=BCkGEA@mail.gmail.com"

      type="cite">

      <div>I guess it's because of double encoding (by explicitly

        defined encoding and that suffix) but I was confused at first.

        It's better not to set encoding but declare it in binary strings

        like they do in Python prepending strings with 'u' literal,

        which doesn't work in Erlang for all cases.</div>

    </blockquote>

    Well, for the u"..." syntax, Python also needs to know the encoding

    of the source file. Unlike Erlang, however, Python can be told what

    the encoding is (and can recognize Unicode files which begin with a

    BOM character).<br>

    <br>

    /Erik<br>

  </body>

</html>