[erlang-bugs] Unicode bug in io:format

Tue Nov 22 12:16:28 CET 2011

Trying to track down the difference between -noinput and lack thereof, I 
find this:

    $ erl -eval 'io:format("~p / ~p\n", [process_info(whereis(user), A)
    || A <- [initial_call, current_function]]), init:stop().'
    Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:8:8] [rq:8]
    [async-threads:0] [kernel-poll:false]

    Eshell V5.8.4  (abort with ^G)
    1> {initial_call,{group,server,3}} /
    {current_function,{group,server_loop,3}}

    $ erl -eval 'io:format("~p / ~p\n", [process_info(whereis(user), A)
    || A <- [initial_call, current_function]]), init:stop().'  -noinput
    {initial_call,{erlang,apply,2}} /
    {current_function,{user,server_loop,2}}

I.e. in one case, 'group' is handling I/O, while in the other, it is 'user'.
In fact, in both cases, only one of the modules is loaded at all:

    $ erl -eval 'io:format("~p\n", [{code:is_loaded(user),
    code:is_loaded(group)}]), init:stop().'
    Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:8:8] [rq:8]
    [async-threads:0] [kernel-poll:false]
    Eshell V5.8.4  (abort with ^G)
    1>
    {false,{file,"/usr/local/lib/erlang/lib/kernel-2.14.4/ebin/group.beam"}}

    $ erl -eval 'io:format("~p\n", [{code:is_loaded(user),
    code:is_loaded(group)}]), init:stop().'  -noinput
    {{file,"/usr/local/lib/erlang/lib/kernel-2.14.4/ebin/user.beam"},false}

The differences in behaviour are caused by differences between these two 
modules.
The character encoding translation is done in group:io_request/3 and 
user:wrap_characters_to_binary/3, respectively (it is the latter which 
produces the "\x" escapes).

/Erik

On 22-11-2011 11:02, Erik Søe Sørensen wrote:
> I thought it might have something to do with io:setopts() being called 
> when -noinput is absent, and not when it is present; the evidence is 
> mixed, but I think I may be on to something useful.
>
> Consider the following extension of your program:
>
>     -module(unicode_test).
>     -export([main/0]).
>
>     main() ->
>         print(),
>         ok = io:setopts(standard_io, [{encoding, unicode}]),
>         print().
>
>     print() ->
>         io:format("Encoding=~p~n",
>     [lists:keyfind(encoding,1,io:getopts())]),
>         io:format("~ts~n",
>     [[1058,1077,1089,1090,1086,1074,1072,1103,32,1089,1090,1088,1086,1082,1072]]),
>         io:format("~ts~n", ["Тестовая строка"]).
>
>
> Without -noinput (and with LANG=da_DK.utf8), I get:
>
>     1> Encoding={encoding,latin1}
>     Тестовая строка
>     Ð¢ÐµÑÑ‚Ð¾Ð²Ð°Ñ ÑÑ‚Ñ€Ð¾ÐºÐ°
>     Encoding={encoding,latin1}
>     Тестовая строка
>     Ð¢ÐµÑÑ‚Ð¾Ð²Ð°Ñ ÑÑ‚Ñ€Ð¾ÐºÐ°
>
> i.e. the list-of-integers version is OK in both cases.
>
> With -noinput, I get:
>
>     Encoding={encoding,latin1}
>     \x{422}\x{435}\x{441}\x{442}\x{43E}\x{432}\x{430}\x{44F}
>     \x{441}\x{442}\x{440}\x{43E}\x{43A}\x{430}
>     Тестовая строка
>     Encoding={encoding,unicode}
>     Тестовая строка
>     Ð¢ÐµÑÑ‚Ð¾Ð²Ð°Ñ ÑÑ‚Ñ€Ð¾ÐºÐ°
>
> I.e. first the string-literal version is good, but after using 
> io:setopts(), the list-of-integers version is the good one.
>
> So, if you explicitly select unicode encoding in your program, you 
> have consistent behaviour.
>
> The only thing that bothers me is that there appears to be something 
> else going on - it's not just about the encoding.
> I find that without -noinput, output is consistent no matter what I 
> set encoding to. With -noinput, on the other hand, output differs 
> whether I select latin1 or unicode encoding.
>
> Hoping this helps.
> /Erik

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20111122/d00d0cc0/attachment.htm>