[erlang-questions] Strings as Lists
Hasan Veldstra
hasan.veldstra@REDACTED
Fri Feb 15 18:35:05 CET 2008
> This is what you should have in your list:
> 1> Text = [16#442, 16#435, 16#43a, 16#441, 16#442].
> [1090,1077,1082,1089,1090]
> You can convert it to utf8 for output
>
> 2> xmerl_ucs:to_utf8(Text).
> [209,130,208,181,208,186,209,129,209,130]
>
> And you can reverse it and convert that to utf8.
>
> 3> xmerl_ucs:to_utf8(lists:reverse(Text)).
> [209,130,209,129,208,186,208,181,209,130]
This would not work on a string with combining characters, e.g. ü
represented as u followed by ¨, or a CJKV ideograph.
A lot of glyphs *cannot* be represented by a single Unicode codepoint.
Plain lists or binaries are good enough in two cases:
1. You don't need to support anything other than ISO Latin-1 (i.e.
Western European languages).
2. You don't need to do much with the Unicode text apart from simply
storing it and spitting it back to the user as-is.
For any other case, what Erlang/OTP offers now is subpar compared to
other modern languages / platforms.
Implementing Unicode from scratch is nasty, and the DIY attitude is
unproductive and dangerous. There needs to be a standard library,
used and tested by everyone.
As I already mentioned in this thread, I'm working on such a library,
and will release an alpha version soon.
--
http://12monkeys.co.uk
http://hypernumbers.com
More information about the erlang-questions
mailing list