[erlang-questions] unicode in string literals
Richard Carlsson
carlsson.richard@REDACTED
Mon Jul 30 17:07:16 CEST 2012
n 07/30/2012 04:25 PM, Joe Armstrong wrote:
> Very strange I tried that earlier, this is what happens:
>
> $ Eshell V5.9 (abort with ^G)
> 1> unicode:characters_to_list([97,226,136,158,98], utf8).
> [97,226,136,158,98]
>
> The manual says the first argument is a utf8 string
The unicode:characters_to_list() function has tripped me up more than
once, and the documentation isn't very clear. The key to understanding
it seems to be to look at the possible types for the input:
Data = latin1_chardata() | chardata() | external_chardata()
These are just versions of chardata(), i.e., possibly deep lists with
mixed integers and binaries, and they only differ in how binary segments
should be interpreted. If there are integers in the list, they will
always be interpreted as full Unicode code points, not needing any
conversion. So if your input is a list (or any Latin1-encoded IO-list),
the following should work:
unicode:characters_to_list(iolist_to_binary([97,226,136,158,98]), utf8).
/Richard
More information about the erlang-questions
mailing list