[erlang-bugs] Unicode bug in io:format
Erik Søe Sørensen
Tue Nov 22 14:02:55 CET 2011
On 22-11-2011 13:11, eurekafag wrote:
> Many thanks for this thorough research! However I have two things to
> mention. Setting or getting encoding introduces noticeable delay in
> launching without -noinput, but with it it starts just as fast as
> usual. Pretty strange.
Yes, I noticed that too; the delay is so long that there is probably a
> And another a bit illogical issue: to print UTF-8 strings one should
> NOT set binary type /utf8. This works fine with encoding
> set: io:format("~ts~n", [<<"Тестовая строка">>]).
> This fails in both noinput-cases with encoding set: io:format("~ts~n",
> [<<"Тестовая строка"/utf8>>]).
Remember that still, *source files are always interpreted as latin-1*.
From http://www.erlang.org/doc/apps/stdlib/unicode_usage.html :
It is convenient to be able to write a list of Unicode characters in
the string syntax. However, the language specifies strings as being
in the ISO-latin-1 character set which the compiler tool chain as
well as many other tools expect.
Also the source code is (for now) still expected to be written using
the ISO-latin-1 character set, why Unicode characters beyond that
range cannot be entered in string literals.
Which means that the "/utf8" modifier will always do a latin1->utf8
So, yes, if you ensure that your source files are UTF-8 encoded, you can
use the string literals as they are, and expect them to be UTF-8.
> I guess it's because of double encoding (by explicitly defined
> encoding and that suffix) but I was confused at first. It's better not
> to set encoding but declare it in binary strings like they do in
> Python prepending strings with 'u' literal, which doesn't work in
> Erlang for all cases.
Well, for the u"..." syntax, Python also needs to know the encoding of
the source file. Unlike Erlang, however, Python can be told what the
encoding is (and can recognize Unicode files which begin with a BOM
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-bugs