[erlang-questions] correct terminology for referring to strings

Eric Moritz eric@REDACTED
Thu Aug 2 05:32:51 CEST 2012


So currently if you encode your source code as UTF-8, string literals
become the literal byte sequences. This is different than in the shell
where string literals get automatically turned into their Unicode
codepoints.

It appears that the solution is a compiler flag that tells the compiler
that string literals should be decoded as UTF-8. So when the compiler reads
the byte sequence 16#C3A9 it knows that it should be a 233 in the list
because 16#C3A9 is the UTF-8 encoded sequence for the codepoint 233.

It don't know what the overall support the chardata() and charlist() is in
the standard lib so doing this may cause many headaches when someone tries
to stuff a charlist() where a iolist()  goes or chardata() where a string()
goes. This may introduce subtle bugs that only occur when non-latin-1
characters are used.

Eric.
On Aug 1, 2012 4:39 AM, "Richard Carlsson" <carlsson.richard@REDACTED>
wrote:

> On 08/01/2012 12:52 AM, CGS wrote:
>
>> Actually, try this:
>>
>> 1. set your environment to UTF-8 (in my case, whatever Linux terminal
>> with BASH environment, export LANG="en_US.utf8", use locale to find your
>> environment language definition - "en_US.latin1" for LATIN-1)
>> 2. in a module:
>>
>> test_reverse(String) -> lists:reverse(String).
>>
>> 3. Give as parameter the example given by yourself.
>> 4. Check the output.
>>
>
> Ah, but when you say "give as parameter" you mean "pass it a string
> literal from the shell", right? I never said anything about strings in the
> shell - that's a different environment from source files, and as you
> described, the shell nowadays detects your locale and translates UTF-8
> console input into a string literal containing Unicode code points. This is
> exactly how it would happen in source code as well, if the compiler only
> knew how to detect that a source file is in a different encoding from
> Latin1. So the compiler is really the main thing that needs to be fixed,
> and then there should be no surprises on the encoding level anymore.
>
>     /Richard
>
> ______________________________**_________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/**listinfo/erlang-questions<http://erlang.org/mailman/listinfo/erlang-questions>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120801/98f237ea/attachment.htm>


More information about the erlang-questions mailing list