<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    I thought it might have something to do with io:setopts() being

    called when -noinput is absent, and not when it is present; the

    evidence is mixed, but I think I may be on to something useful.<br>

    <br>

    Consider the following extension of your program:<br>

    <blockquote>-module(unicode_test).<br>

      -export([main/0]).<br>

      <br>

      main() -><br>

          print(),<br>

          ok = io:setopts(standard_io, [{encoding, unicode}]),<br>

          print().<br>

      <br>

      print() -><br>

          io:format("Encoding=~p~n",

      [lists:keyfind(encoding,1,io:getopts())]),<br>

          io:format("~ts~n",

[[1058,1077,1089,1090,1086,1074,1072,1103,32,1089,1090,1088,1086,1082,1072]]),<br>

          io:format("~ts~n", ["Тестовая строка"]).<br>

    </blockquote>

    <br>

    Without -noinput (and with LANG=da_DK.utf8), I get:<br>

    <blockquote>1> Encoding={encoding,latin1}<br>

      Тестовая строка<br>

      Ð¢ÐµÑÑÐ¾Ð²Ð°Ñ ÑÑÑÐ¾ÐºÐ°<br>

      Encoding={encoding,latin1}<br>

      Тестовая строка<br>

      Ð¢ÐµÑÑÐ¾Ð²Ð°Ñ ÑÑÑÐ¾ÐºÐ°<br>

    </blockquote>

    i.e. the list-of-integers version is OK in both cases.<br>

    <br>

    With -noinput, I get:<br>

    <blockquote>Encoding={encoding,latin1}<br>

      \x{422}\x{435}\x{441}\x{442}\x{43E}\x{432}\x{430}\x{44F}

      \x{441}\x{442}\x{440}\x{43E}\x{43A}\x{430}<br>

      Тестовая строка<br>

      Encoding={encoding,unicode}<br>

      Тестовая строка<br>

      Ð¢ÐµÑÑÐ¾Ð²Ð°Ñ ÑÑÑÐ¾ÐºÐ°<br>

    </blockquote>

    I.e. first the string-literal version is good, but after using

    io:setopts(), the list-of-integers version is the good one.<br>

    <br>

    So, if you explicitly select unicode encoding in your program, you

    have consistent behaviour.<br>

    <br>

    The only thing that bothers me is that there appears to be something

    else going on - it's not just about the encoding.<br>

    I find that without -noinput, output is consistent no matter what I

    set encoding to. With -noinput, on the other hand, output differs

    whether I select latin1 or unicode encoding.<br>

    <br>

    Hoping this helps.<br>

    /Erik<br>

    <br>

    On 21-11-2011 22:42, eurekafag wrote:

    <blockquote

cite="mid:CALpRnif7n=fDHFdym7usOA1YrE3MAkjqr5Ey8pLwsy3CDQnBqg@mail.gmail.com"

      type="cite">Thanks, I'm aware of it. The problem is different

      behavior with and without -noinput. I'm just curious which case is

      right and why it makes difference at all. I explicitly define that

      binary string as utf8-encoded but it only works with -noinput and

      fails without it. On the other hand, a list without any unicode

      letters at all (only integers) printed as hex values with -noinput

      and as test without it. It may be understandable if this is some

      kind of parser problem which wants latin-1 letters in source but

      what's wrong with plain list of integers which it fails to output

      as a string? The problem is that those two cases are mutually

      exclusive so one of them works with -noinput and fails without and

      vice versa. So I'm curious which method I should use so it works

      like expected.<br>

      <br>

      <div class="gmail_quote">22 ноября 2011 г. 0:19 пользователь Paul

        Davis <span dir="ltr"><<a moz-do-not-send="true"

            href="mailto:paul.joseph.davis@gmail.com">paul.joseph.davis@gmail.com</a>></span>

        написал:<br>

        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

          0.8ex; border-left: 1px solid rgb(204, 204, 204);

          padding-left: 1ex;">

          Oh, good call. I just pasted your code into the shell and it

          worked.<br>

          But then when compiling it into a file it breaks like you

          have.<br>

          Specifically, the UTF-8 literal in the source file is broken.

          This<br>

          suggests that the Erlang compiler doesn't like UTF-8 literals,

          and<br>

          sure enough, a quick google brought up a post:<br>

          <br>

          <a moz-do-not-send="true"

href="http://erlang.2086793.n4.nabble.com/utf8-in-source-files-td3031128.html"

            target="_blank">http://erlang.2086793.n4.nabble.com/utf8-in-source-files-td3031128.html</a><br>

          <br>

          Which references:<br>

          <br>

          <a moz-do-not-send="true"

            href="http://www.erlang.org/doc/apps/stdlib/unicode_usage.html"

            target="_blank">http://www.erlang.org/doc/apps/stdlib/unicode_usage.html</a><br>

          <br>

          HTH,<br>

          <span class="HOEnZb"><font color="#888888">Paul Davis<br>

            </font></span>

          <div class="HOEnZb">

            <div class="h5"><br>

              On Mon, Nov 21, 2011 at 2:06 PM, eurekafag <<a

                moz-do-not-send="true"

                href="mailto:eurekafag@eureka7.ru">eurekafag@eureka7.ru</a>>

              wrote:<br>

              > What exactly do you get? Please, provide the full

              output of both cases with<br>

              > and without -noinput. I tried export LANG=en_US.UTF-8

              (my system-wide locale<br>

              > is ru_RU.UTF-8) and I still get the same result.<br>

              ><br>

            </div>

          </div>

          <div class="HOEnZb">

            <div class="h5">>

              _______________________________________________<br>

              > erlang-bugs mailing list<br>

              > <a moz-do-not-send="true"

                href="mailto:erlang-bugs@erlang.org">erlang-bugs@erlang.org</a><br>

              > <a moz-do-not-send="true"

                href="http://erlang.org/mailman/listinfo/erlang-bugs"

                target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a><br>

              ><br>

              ><br>

            </div>

          </div>

        </blockquote>

      </div>

      <br>

    </blockquote>

    <br>

  </body>

</html>