[erlang-questions] unicode in string literals

Richard Carlsson carlsson.richard@REDACTED
Mon Jul 30 17:07:16 CEST 2012


n 07/30/2012 04:25 PM, Joe Armstrong wrote:
> Very strange I tried that earlier, this is what happens:
>
> $ Eshell V5.9  (abort with ^G)
> 1> unicode:characters_to_list([97,226,136,158,98], utf8).
> [97,226,136,158,98]
>
> The manual says the first argument is a utf8 string

The unicode:characters_to_list() function has tripped me up more than 
once, and the documentation isn't very clear. The key to understanding 
it seems to be to look at the possible types for the input:

   Data = latin1_chardata() | chardata() | external_chardata()

These are just versions of chardata(), i.e., possibly deep lists with 
mixed integers and binaries, and they only differ in how binary segments 
should be interpreted. If there are integers in the list, they will 
always be interpreted as full Unicode code points, not needing any 
conversion. So if your input is a list (or any Latin1-encoded IO-list), 
the following should work:

unicode:characters_to_list(iolist_to_binary([97,226,136,158,98]), utf8).

     /Richard




More information about the erlang-questions mailing list