[erlang-questions] unicode in string literals

Tue Jul 31 09:32:51 CEST 2012

On Tue, Jul 31, 2012 at 9:09 AM, Michael Truog <mjtruog@REDACTED> wrote:
>-----8<----------
>>> The solution with the way things are currently, is just to use modelines (within the first 3 lines of the file) which are supported in your favorite editor, vi or emacs:
>>> % -*- coding: utf-8; Mode: erlang; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
>>> % ex: set softtabstop=4 tabstop=4 shiftwidth=4 expandtab fileencoding=utf-8:
>>>
>> Shouldn't that modeline read:
>> % -*- coding: latin-1; mode: erlang; tab-width: 4; c-basic-offset: 4;
>> indent-tabs-mode: nil -*-
>>
>> Since the compiler assumes source files are in Latin 1
>
> I think the point was to use utf8 in the source file, thus the utf8 in the modeline.  The encoding() would be necessary for various erlang names (like functions, variables, etc.) to be in utf8, but the modeline could help keep list data as utf8.

IMO this doesn't solve the problem, and only confuses the issue;
consider the following:

test() ->
    io:format("~w~n", ["Just my €0.02"]),
    io:format("~w~n", [lists:reverse("Just my €0.02")]).

> test().
[74,117,115,116,32,109,121,32,226,130,172,48,46,48,50]
[50,48,46,48,172,130,226,32,121,109,32,116,115,117,74]

If the list data was kept as UTF-8 then the output of the second
statement should be:
[50,48,46,48,226,130,172,32,121,109,32,116,115,117,74]

The above of course depends on whether you view strings as lists of
bytes vs lists of characters.

-- 
My other car is a cdr.