[erlang-questions] Strings as Lists

Christian S chsu79@REDACTED
Wed Feb 13 20:54:49 CET 2008


2008/2/13 tsuraan <tsuraan@REDACTED>:
> So in erlang, if you type a string literal that is (e.g.) japanese, does it
> create a list of the utf-32 codepoints of the string you wrote?  I'd try it,
> but I don't trust my computer's i18n support enough to trust the results of
> any test I could do.

The tokenizer expects latin1 input:

http://erlang.org/doc/reference_manual/introduction.html#1.6

Of course, if you put utf8 encoded data into your strings it will
happily create interpret your latin "Ä" as [$Ã, $Ä] or whatever
sequence the Ä utf8-encoded looks like as when viewed as latin1. The
list [195,132] is what shows up if i enter it using my utf8 xterm.

Didn't the list have a long thread about io character encodings a
couple years ago? Or am I mixing it up with the with character
encoding issues in common lisp's io system?

There is some issues that show up if you write an alternative lexer
that decodes utf8 into Unicode character points in lists, such as
list_to_binary() expecting a string() type and choking if there is an
integer above 255. This must be what the list had a thread about. The
need for unicode_list_to_utf8_binary/1 and a dozen others target
encodings.



More information about the erlang-questions mailing list