<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
I thought it might have something to do with io:setopts() being
called when -noinput is absent, and not when it is present; the
evidence is mixed, but I think I may be on to something useful.<br>
<br>
Consider the following extension of your program:<br>
<blockquote>-module(unicode_test).<br>
-export([main/0]).<br>
<br>
main() -><br>
print(),<br>
ok = io:setopts(standard_io, [{encoding, unicode}]),<br>
print().<br>
<br>
print() -><br>
io:format("Encoding=~p~n",
[lists:keyfind(encoding,1,io:getopts())]),<br>
io:format("~ts~n",
[[1058,1077,1089,1090,1086,1074,1072,1103,32,1089,1090,1088,1086,1082,1072]]),<br>
io:format("~ts~n", ["Тестовая строка"]).<br>
</blockquote>
<br>
Without -noinput (and with LANG=da_DK.utf8), I get:<br>
<blockquote>1> Encoding={encoding,latin1}<br>
Тестовая строка<br>
ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока<br>
Encoding={encoding,latin1}<br>
Тестовая строка<br>
ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока<br>
</blockquote>
i.e. the list-of-integers version is OK in both cases.<br>
<br>
With -noinput, I get:<br>
<blockquote>Encoding={encoding,latin1}<br>
\x{422}\x{435}\x{441}\x{442}\x{43E}\x{432}\x{430}\x{44F}
\x{441}\x{442}\x{440}\x{43E}\x{43A}\x{430}<br>
Тестовая строка<br>
Encoding={encoding,unicode}<br>
Тестовая строка<br>
ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока<br>
</blockquote>
I.e. first the string-literal version is good, but after using
io:setopts(), the list-of-integers version is the good one.<br>
<br>
So, if you explicitly select unicode encoding in your program, you
have consistent behaviour.<br>
<br>
The only thing that bothers me is that there appears to be something
else going on - it's not just about the encoding.<br>
I find that without -noinput, output is consistent no matter what I
set encoding to. With -noinput, on the other hand, output differs
whether I select latin1 or unicode encoding.<br>
<br>
Hoping this helps.<br>
/Erik<br>
<br>
On 21-11-2011 22:42, eurekafag wrote:
<blockquote
cite="mid:CALpRnif7n=fDHFdym7usOA1YrE3MAkjqr5Ey8pLwsy3CDQnBqg@mail.gmail.com"
type="cite">Thanks, I'm aware of it. The problem is different
behavior with and without -noinput. I'm just curious which case is
right and why it makes difference at all. I explicitly define that
binary string as utf8-encoded but it only works with -noinput and
fails without it. On the other hand, a list without any unicode
letters at all (only integers) printed as hex values with -noinput
and as test without it. It may be understandable if this is some
kind of parser problem which wants latin-1 letters in source but
what's wrong with plain list of integers which it fails to output
as a string? The problem is that those two cases are mutually
exclusive so one of them works with -noinput and fails without and
vice versa. So I'm curious which method I should use so it works
like expected.<br>
<br>
<div class="gmail_quote">22 ноября 2011 г. 0:19 пользователь Paul
Davis <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:paul.joseph.davis@gmail.com">paul.joseph.davis@gmail.com</a>></span>
написал:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
Oh, good call. I just pasted your code into the shell and it
worked.<br>
But then when compiling it into a file it breaks like you
have.<br>
Specifically, the UTF-8 literal in the source file is broken.
This<br>
suggests that the Erlang compiler doesn't like UTF-8 literals,
and<br>
sure enough, a quick google brought up a post:<br>
<br>
<a moz-do-not-send="true"
href="http://erlang.2086793.n4.nabble.com/utf8-in-source-files-td3031128.html"
target="_blank">http://erlang.2086793.n4.nabble.com/utf8-in-source-files-td3031128.html</a><br>
<br>
Which references:<br>
<br>
<a moz-do-not-send="true"
href="http://www.erlang.org/doc/apps/stdlib/unicode_usage.html"
target="_blank">http://www.erlang.org/doc/apps/stdlib/unicode_usage.html</a><br>
<br>
HTH,<br>
<span class="HOEnZb"><font color="#888888">Paul Davis<br>
</font></span>
<div class="HOEnZb">
<div class="h5"><br>
On Mon, Nov 21, 2011 at 2:06 PM, eurekafag <<a
moz-do-not-send="true"
href="mailto:eurekafag@eureka7.ru">eurekafag@eureka7.ru</a>>
wrote:<br>
> What exactly do you get? Please, provide the full
output of both cases with<br>
> and without -noinput. I tried export LANG=en_US.UTF-8
(my system-wide locale<br>
> is ru_RU.UTF-8) and I still get the same result.<br>
><br>
</div>
</div>
<div class="HOEnZb">
<div class="h5">>
_______________________________________________<br>
> erlang-bugs mailing list<br>
> <a moz-do-not-send="true"
href="mailto:erlang-bugs@erlang.org">erlang-bugs@erlang.org</a><br>
> <a moz-do-not-send="true"
href="http://erlang.org/mailman/listinfo/erlang-bugs"
target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a><br>
><br>
><br>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</body>
</html>