[erlang-questions] unicode in string literals
Richard Carlsson
carlsson.richard@REDACTED
Mon Jul 30 15:23:09 CEST 2012
On 07/30/2012 03:06 PM, CGS wrote:
> Hi Joe,
>
> You may try unicode module:
>
> test() -> unicode:characters_to_list("a∞b",utf8).
>
> which will return the desired list [97,8734,98]. As Richard said, the
> default is Latin-1 (0-255 integers).
No! Don't save a source file as UTF8, at least without a way of marking
up such files as being special. The problem is that if you do the trick
above, you have to ensure that you convert _all_ string literals
explicitly this way (at least if they may contain characters outside
ASCII). But if you have a character such as ö, or é, in a string and you
forget to convert explicitly from UTF8 to single code points, then that
"é" will in fact be 2 bytes, while in another module saved in Latin-1,
the string "é" that looks the same in your editor will be a single byte,
and they won't compare equal. Having modules saved with different
encodings is a recipe for disaster (in particular when it comes to
future maintenance). Erlang currently only supports Latin-1 in source
files; until that is fixed, you should keep your UTF8-data in separate
files.
/Richard
More information about the erlang-questions
mailing list