[erlang-questions] unicode:characters_to_list

Thu Mar 22 15:57:43 CET 2012

On 2012-03-22, at 15:32 , Henning Diedrich wrote:

> Hi Masklinn,
> 
> no question I am getting something wrong here.
> 
> On 3/22/12 2:53 PM, Masklinn wrote:
>> I don't understand why you're looking for UTF8 as the output of
>> characters_to_list.
> 
> That's what the doc leads me to believe I guess.
> 
> "a list of integers representing unicode characters."

"Unicode characters" is an alias for unicode code points: http://en.wikipedia.org/wiki/Code_point

> 248 is not a valid unicode Bytecode.

248 is in fact a valid codepoint, it's U+00F8 which you can find in this
table:
http://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF (row
00F0, column 8). It's not a valid *utf8* byte.

> Not sure if 50104 is the right one for the same character, but it would have to be two bytes and I would thus expect an integer higher 256.

You're talking about UTF-8 encoded codepoints, "unicode characters" are not UTF.

> If 248 is meant as actual number of the letter, which it is no matter the bitcode, then what is the right function to make a Unicode binary again from the list entries, unicode:character_to_binary/1 ?

1. There is no such thing as "a unicode binary"
2. Character_to_binary can be used to encode unicode strings to
   UTF8 (the default) or an other unicode transformation format.