[erlang-questions] unicode:characters_to_list

Thu Mar 22 15:46:48 CET 2012

unicode:characters_to_list returns a list of unicode codepoints, not
utf8 bytecodes. You can get the latter with
unicode:characters_to_binary/1,2,3

io:format("~w", [unicode:characters_to_list("ø", utf8)]).   =>  [248]

vs

io:format("~w", [unicode:characters_to_binary("ø")]).   =>  <<195,184>>

Hope this makes it a little clearer.

/Daniel

On 22 March 2012 15:32, Henning Diedrich <hd2010@REDACTED> wrote:
> Hi Masklinn,
>
> no question I am getting something wrong here.
>
>
> On 3/22/12 2:53 PM, Masklinn wrote:
>
> I don't understand why you're looking for UTF8 as the output of
> characters_to_list.
>
>
> That's what the doc leads me to believe I guess.
>
>
> "a list of integers representing unicode characters."
>
> should that be understood as "formerly unicode now latin-1 characters"?
>
> 248 is not a valid unicode Bytecode. Not sure if 50104 is the right one for
> the same character, but it would have to be two bytes and I would thus
> expect an integer higher 256.
>
> Or "Character code points not encoded as UTF-8"?
>
> If 248 is meant as actual number of the letter, which it is no matter the
> bitcode, then what is the right function to make a Unicode binary again from
> the list entries, unicode:character_to_binary/1 ?
>
> Thanks,
> Henning
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>