[erlang-questions] unicode in string literals

Joe Armstrong <>
Mon Jul 30 16:25:54 CEST 2012


On Mon, Jul 30, 2012 at 3:06 PM, CGS <> wrote:
> Hi Joe,
>
> You may try unicode module:
>
> test() -> unicode:characters_to_list("a∞b",utf8).
>
> which will return the desired list [97,8734,98]. As Richard said, the
> default is Latin-1 (0-255 integers).

Very strange I tried that earlier, this is what happens:

$ Eshell V5.9  (abort with ^G)
1> unicode:characters_to_list([97,226,136,158,98], utf8).
[97,226,136,158,98]

The manual says the first argument is a utf8 string

/Joe

/Joe
>
> As for binaries, the same problem (assuming Latin-1).
>
> CGS
>
>
>
>
> On Mon, Jul 30, 2012 at 2:35 PM, Joe Armstrong <> wrote:
>>
>> What is a literal string in Erlang? Originally it was a list of
>> integers, each integer
>> being a single character code - this made strings very easy to work with
>>
>> The code
>>
>>     test() -> "a∞b".
>>
>> Compiles to code which returns the list
>> of integers [97,226,136,158,98].
>>
>> This is very inconvenient. I had expected it to return
>> [97, 8734, 98]. The length of the list should be 3 not 5
>> since it contains three unicode characters not five.
>>
>> Is this a bug or a horrible misfeature?
>>
>> So how can I make a string with the three characters 'a' 'infinity' 'b'
>>
>> test() -> "a\x{221e}b"        is ugly
>>
>> test() -> <<"a∞b"/utf8>>   seems to be a bug
>>                                             it gives an error in the
>> shell but is ok in compiled code and
>>                                             returns
>> <<97,195,162,194,136,194,158,98>> which is
>>                                             very strange
>>
>> test() -> [$a,8734,$b]       is ugly
>>
>> /Joe
>> _______________________________________________
>> erlang-questions mailing list
>> 
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>



More information about the erlang-questions mailing list