[erlang-questions] unicode in string literals
Mon Jul 30 16:42:08 CEST 2012
On Mon, Jul 30, 2012 at 3:02 PM, Richard Carlsson
> On 07/30/2012 02:35 PM, Joe Armstrong wrote:
>> What is a literal string in Erlang? Originally it was a list of
>> integers, each integer
>> being a single character code - this made strings very easy to work with
>> The code
>> test() -> "a∞b".
>> Compiles to code which returns the list
>> of integers [97,226,136,158,98].
>> This is very inconvenient. I had expected it to return
>> [97, 8734, 98]. The length of the list should be 3 not 5
>> since it contains three unicode characters not five.
>> Is this a bug or a horrible misfeature?
> You saved your source file as UTF-8, so between the two double-quotes, the
> source file contains exactly those bytes. But the Erlang compiler assumes
> your source code is Latin-1, so it thinks that you wrote a Latin-1 string of
> 5 characters (some of which are non-printing). There's as yet no support for
> telling the compiler that the input is anything else than Latin-1, so you
> can't save your source files as UTF-8. (One thing you can do is put the
> UTF-8 strings in another file and read them at runtime.)
Oh dear - you're right of course.
This means that the only portable and 100% correct way to get 'a'
'INFINITY' 'b' into a string literal
"a∞b" in any form won't work if the compiler is not explicitly
told "this file is utf8"
Should the pre-processor make a rude noise and only accept latin1
>> test() -> <<"a∞b"/utf8>> seems to be a bug
> Try <<"åäö"/utf8>>. It works, but like your first example, the source string
> is limited to Latin-1. Strings entered in the shell may be interpreted
> differently though, depending on your locale settings.
> erlang-questions mailing list
More information about the erlang-questions