[erlang-bugs] Unicode bug in io:format
Erik Søe Sørensen
ess@REDACTED
Tue Nov 22 11:02:10 CET 2011
I thought it might have something to do with io:setopts() being called
when -noinput is absent, and not when it is present; the evidence is
mixed, but I think I may be on to something useful.
Consider the following extension of your program:
-module(unicode_test).
-export([main/0]).
main() ->
print(),
ok = io:setopts(standard_io, [{encoding, unicode}]),
print().
print() ->
io:format("Encoding=~p~n",
[lists:keyfind(encoding,1,io:getopts())]),
io:format("~ts~n",
[[1058,1077,1089,1090,1086,1074,1072,1103,32,1089,1090,1088,1086,1082,1072]]),
io:format("~ts~n", ["Тестовая строка"]).
Without -noinput (and with LANG=da_DK.utf8), I get:
1> Encoding={encoding,latin1}
Тестовая строка
ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока
Encoding={encoding,latin1}
Тестовая строка
ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока
i.e. the list-of-integers version is OK in both cases.
With -noinput, I get:
Encoding={encoding,latin1}
\x{422}\x{435}\x{441}\x{442}\x{43E}\x{432}\x{430}\x{44F}
\x{441}\x{442}\x{440}\x{43E}\x{43A}\x{430}
Тестовая строка
Encoding={encoding,unicode}
Тестовая строка
ТеÑÑÐ¾Ð²Ð°Ñ ÑÑÑока
I.e. first the string-literal version is good, but after using
io:setopts(), the list-of-integers version is the good one.
So, if you explicitly select unicode encoding in your program, you have
consistent behaviour.
The only thing that bothers me is that there appears to be something
else going on - it's not just about the encoding.
I find that without -noinput, output is consistent no matter what I set
encoding to. With -noinput, on the other hand, output differs whether I
select latin1 or unicode encoding.
Hoping this helps.
/Erik
On 21-11-2011 22:42, eurekafag wrote:
> Thanks, I'm aware of it. The problem is different behavior with and
> without -noinput. I'm just curious which case is right and why it
> makes difference at all. I explicitly define that binary string as
> utf8-encoded but it only works with -noinput and fails without it. On
> the other hand, a list without any unicode letters at all (only
> integers) printed as hex values with -noinput and as test without it.
> It may be understandable if this is some kind of parser problem which
> wants latin-1 letters in source but what's wrong with plain list of
> integers which it fails to output as a string? The problem is that
> those two cases are mutually exclusive so one of them works with
> -noinput and fails without and vice versa. So I'm curious which method
> I should use so it works like expected.
>
> 22 ноября 2011 г. 0:19 пользователь Paul Davis
> <paul.joseph.davis@REDACTED <mailto:paul.joseph.davis@REDACTED>>
> написал:
>
> Oh, good call. I just pasted your code into the shell and it worked.
> But then when compiling it into a file it breaks like you have.
> Specifically, the UTF-8 literal in the source file is broken. This
> suggests that the Erlang compiler doesn't like UTF-8 literals, and
> sure enough, a quick google brought up a post:
>
> http://erlang.2086793.n4.nabble.com/utf8-in-source-files-td3031128.html
>
> Which references:
>
> http://www.erlang.org/doc/apps/stdlib/unicode_usage.html
>
> HTH,
> Paul Davis
>
> On Mon, Nov 21, 2011 at 2:06 PM, eurekafag <eurekafag@REDACTED
> <mailto:eurekafag@REDACTED>> wrote:
> > What exactly do you get? Please, provide the full output of both
> cases with
> > and without -noinput. I tried export LANG=en_US.UTF-8 (my
> system-wide locale
> > is ru_RU.UTF-8) and I still get the same result.
> >
> > _______________________________________________
> > erlang-bugs mailing list
> > erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
> > http://erlang.org/mailman/listinfo/erlang-bugs
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20111122/8be5c53e/attachment.htm>
More information about the erlang-bugs
mailing list