[erlang-questions] unicode:characters_to_list

Thu Mar 22 14:53:29 CET 2012

On 2012-03-22, at 14:40 , Henning Diedrich wrote:

> Hi,
> 
> I am perplexed about this result:
> 
> >         io:format("       ~s~nLatin: ~w~nUTF-8: ~w~nUTF-8 list: ~s~nUTF-8 list: ~w~n", [
> > <<"ø">>,
> > <<"ø">>,
> > <<"ø"/utf8>>,
> >         unicode:characters_to_list(<<"ø"/utf8>>,utf8),
> >         unicode:characters_to_list(<<"ø"/utf8>>,utf8)
> >     ]).
>       ø
> Latin: <<248>>
> UTF-8: <<195,184>>
> UTF-8 list: ø
> UTF-8 list: [248]
> ok
> 
> Should not unicode:characters_to_list return a list with Unicode code points?

That's what it does?

> The docs say: "This function converts a possibly deep list of integers and binaries into a list of integers representing unicode characters."
> 
> http://www.erlang.org/doc/man/unicode.html#characters_to_list-2
> 
> In other words, I'd expect as results:
> 
>       ø
> Latin: <<248>>
> UTF-8: <<195,184>>
> UTF-8 list: bad argument
> UTF-8 list: [50104]
> ok

Why would you expect that? The code point for ø is 248, where would
50104 come from? And why would the third version yield a bad argument
when it's a valid string?

As far as I can see, everything seems to be working correctly:
<<195,184>> is decoded to the unicode list [248] aka the string "ø". I
don't understand why you're looking for UTF8 as the output of
characters_to_list.