[erlang-questions] unicode in string literals

Richard Carlsson carlsson.richard@REDACTED
Mon Jul 30 15:23:09 CEST 2012


On 07/30/2012 03:06 PM, CGS wrote:
> Hi Joe,
>
> You may try unicode module:
>
> test() -> unicode:characters_to_list("a∞b",utf8).
>
> which will return the desired list [97,8734,98]. As Richard said, the
> default is Latin-1 (0-255 integers).

No! Don't save a source file as UTF8, at least without a way of marking 
up such files as being special. The problem is that if you do the trick 
above, you have to ensure that you convert _all_ string literals 
explicitly this way (at least if they may contain characters outside 
ASCII). But if you have a character such as ö, or é, in a string and you 
forget to convert explicitly from UTF8 to single code points, then that 
"é" will in fact be 2 bytes, while in another module saved in Latin-1, 
the string "é" that looks the same in your editor will be a single byte, 
and they won't compare equal. Having modules saved with different 
encodings is a recipe for disaster (in particular when it comes to 
future maintenance). Erlang currently only supports Latin-1 in source 
files; until that is fixed, you should keep your UTF8-data in separate 
files.

    /Richard




More information about the erlang-questions mailing list