[erlang-questions] source file encoding
Richard Carlsson
carlsson.richard@REDACTED
Tue Dec 20 13:00:01 CET 2011
On 12/19/2011 11:18 AM, Justus wrote:
> Hi all,
>
> Strings must be in the ISO-latin-1 character set. I remember that
> errors will be reported if other characters occurring in a .erl file
> when compiling.
>
> But when trying R15B, it looks that values beyond ISO-latin-1 are also
> accepted. So now, we can use UTF8 without BOM encoding, and with the
> help of ct_expand, I managed to say "hello world" in Chinese
> literally.
>
> I wonder is there any plan add Unicode support in string- and
> character-literals?
>
> -compile({parse_transform, ct_expand}).
>
> -define(STR(S), ct_expand:term(unicode:characters_to_list(list_to_binary(S)))).
>
> hello_world() ->
> S = ?STR("你好, 世界"),
> io:format("~ts~n", [S]).
>
The code that you wrote is actually the following:
S = ?STR("ä½ å¥½, ä¸ç"),
Even if your editor shows you chinese characters and saves the file as
utf-8, Erlang still treats the input as Latin-1. (All byte sequences are
valid latin-1, so there is no foolproof way of separating utf-8 files
from latin-1 files automatically).
To understand where things go can wrong if you start saving source files
as utf-8, consider the following two modules:
module(m1).
...
Pid ! "Mickaël",
...
module(m2).
...
receive
"Mickaël" -> ok
end
...
Assume that the first is saved with Latin-1 and the second with UTF-8.
Even though they may look the same to your eyes (because your editor
hides the difference) the code in the second file is really waiting for
the following string, and the program will not work:
receive
"Micka\303\253l" -> ok
end
/Richard
More information about the erlang-questions
mailing list