[erlang-questions] Strings as Lists

Hasan Veldstra hasan.veldstra@REDACTED
Fri Feb 15 18:35:05 CET 2008


> This is what you should have in your list:
> 1> Text = [16#442, 16#435, 16#43a, 16#441, 16#442].
> [1090,1077,1082,1089,1090]
> You can convert it to utf8 for output
>
> 2> xmerl_ucs:to_utf8(Text).
> [209,130,208,181,208,186,209,129,209,130]
>
> And you can reverse it and convert that to utf8.
>
> 3> xmerl_ucs:to_utf8(lists:reverse(Text)).
> [209,130,209,129,208,186,208,181,209,130]


This would not work on a string with combining characters, e.g. ü  
represented as u followed by ¨, or a CJKV ideograph.

A lot of glyphs *cannot* be represented by a single Unicode codepoint.

Plain lists or binaries are good enough in two cases:
1. You don't need to support anything other than ISO Latin-1 (i.e.  
Western European languages).
2. You don't need to do much with the Unicode text apart from simply  
storing it and spitting it back to the user as-is.

For any other case, what Erlang/OTP offers now is subpar compared to  
other modern languages / platforms.

Implementing Unicode from scratch is nasty, and the DIY attitude is  
unproductive and dangerous. There needs to be a standard library,  
used and tested by everyone.

As I already mentioned in this thread, I'm working on such a library,  
and will release an alpha version soon.



--
http://12monkeys.co.uk
http://hypernumbers.com


More information about the erlang-questions mailing list