[erlang-bugs] Unicode bug in io:format
Erik Søe Sørensen
ess@REDACTED
Tue Nov 22 12:16:28 CET 2011
Trying to track down the difference between -noinput and lack thereof, I
find this:
$ erl -eval 'io:format("~p / ~p\n", [process_info(whereis(user), A)
|| A <- [initial_call, current_function]]), init:stop().'
Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:8:8] [rq:8]
[async-threads:0] [kernel-poll:false]
Eshell V5.8.4 (abort with ^G)
1> {initial_call,{group,server,3}} /
{current_function,{group,server_loop,3}}
$ erl -eval 'io:format("~p / ~p\n", [process_info(whereis(user), A)
|| A <- [initial_call, current_function]]), init:stop().' -noinput
{initial_call,{erlang,apply,2}} /
{current_function,{user,server_loop,2}}
I.e. in one case, 'group' is handling I/O, while in the other, it is 'user'.
In fact, in both cases, only one of the modules is loaded at all:
$ erl -eval 'io:format("~p\n", [{code:is_loaded(user),
code:is_loaded(group)}]), init:stop().'
Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:8:8] [rq:8]
[async-threads:0] [kernel-poll:false]
Eshell V5.8.4 (abort with ^G)
1>
{false,{file,"/usr/local/lib/erlang/lib/kernel-2.14.4/ebin/group.beam"}}
$ erl -eval 'io:format("~p\n", [{code:is_loaded(user),
code:is_loaded(group)}]), init:stop().' -noinput
{{file,"/usr/local/lib/erlang/lib/kernel-2.14.4/ebin/user.beam"},false}
The differences in behaviour are caused by differences between these two
modules.
The character encoding translation is done in group:io_request/3 and
user:wrap_characters_to_binary/3, respectively (it is the latter which
produces the "\x" escapes).
/Erik
On 22-11-2011 11:02, Erik Søe Sørensen wrote:
> I thought it might have something to do with io:setopts() being called
> when -noinput is absent, and not when it is present; the evidence is
> mixed, but I think I may be on to something useful.
>
> Consider the following extension of your program:
>
> -module(unicode_test).
> -export([main/0]).
>
> main() ->
> print(),
> ok = io:setopts(standard_io, [{encoding, unicode}]),
> print().
>
> print() ->
> io:format("Encoding=~p~n",
> [lists:keyfind(encoding,1,io:getopts())]),
> io:format("~ts~n",
> [[1058,1077,1089,1090,1086,1074,1072,1103,32,1089,1090,1088,1086,1082,1072]]),
> io:format("~ts~n", ["Тестовая строка"]).
>
>
> Without -noinput (and with LANG=da_DK.utf8), I get:
>
> 1> Encoding={encoding,latin1}
> Тестовая строка
> ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока
> Encoding={encoding,latin1}
> Тестовая строка
> ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока
>
> i.e. the list-of-integers version is OK in both cases.
>
> With -noinput, I get:
>
> Encoding={encoding,latin1}
> \x{422}\x{435}\x{441}\x{442}\x{43E}\x{432}\x{430}\x{44F}
> \x{441}\x{442}\x{440}\x{43E}\x{43A}\x{430}
> Тестовая строка
> Encoding={encoding,unicode}
> Тестовая строка
> ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока
>
> I.e. first the string-literal version is good, but after using
> io:setopts(), the list-of-integers version is the good one.
>
> So, if you explicitly select unicode encoding in your program, you
> have consistent behaviour.
>
> The only thing that bothers me is that there appears to be something
> else going on - it's not just about the encoding.
> I find that without -noinput, output is consistent no matter what I
> set encoding to. With -noinput, on the other hand, output differs
> whether I select latin1 or unicode encoding.
>
> Hoping this helps.
> /Erik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20111122/d00d0cc0/attachment.htm>
More information about the erlang-bugs
mailing list