[erlang-questions] unicode:characters_to_list
Masklinn
masklinn@REDACTED
Thu Mar 22 14:53:29 CET 2012
On 2012-03-22, at 14:40 , Henning Diedrich wrote:
> Hi,
>
> I am perplexed about this result:
>
> > io:format(" ~s~nLatin: ~w~nUTF-8: ~w~nUTF-8 list: ~s~nUTF-8 list: ~w~n", [
> > <<"ø">>,
> > <<"ø">>,
> > <<"ø"/utf8>>,
> > unicode:characters_to_list(<<"ø"/utf8>>,utf8),
> > unicode:characters_to_list(<<"ø"/utf8>>,utf8)
> > ]).
> ø
> Latin: <<248>>
> UTF-8: <<195,184>>
> UTF-8 list: ø
> UTF-8 list: [248]
> ok
>
> Should not unicode:characters_to_list return a list with Unicode code points?
That's what it does?
> The docs say: "This function converts a possibly deep list of integers and binaries into a list of integers representing unicode characters."
>
> http://www.erlang.org/doc/man/unicode.html#characters_to_list-2
>
> In other words, I'd expect as results:
>
> ø
> Latin: <<248>>
> UTF-8: <<195,184>>
> UTF-8 list: bad argument
> UTF-8 list: [50104]
> ok
Why would you expect that? The code point for ø is 248, where would
50104 come from? And why would the third version yield a bad argument
when it's a valid string?
As far as I can see, everything seems to be working correctly:
<<195,184>> is decoded to the unicode list [248] aka the string "ø". I
don't understand why you're looking for UTF8 as the output of
characters_to_list.
More information about the erlang-questions
mailing list