[erlang-questions] Strings as Lists

Hasan Veldstra hasan.veldstra@REDACTED
Thu Feb 14 10:37:28 CET 2008


Erlang currently sucks for working with Unicode, and as a  
consequence, sucks for working with strings.

This isn't a fault of the language, just the lack of libraries.

Pretending that lists with a bit of DIY are good enough doesn't help.

Yeah, you can load text in any Unicode encoding into an Erlang list  
with no problems... but there's much more to supporting Unicode than  
that.

For example, say you've got the string "привет" (which is  
Russian for "hi") encoded in UTF-8 in list L:

L = [208, 191, 209, 128, 208, 184, 208, 178, 208, 181, 209, 130]

Now say you want to convert it to uppercase. Well, you can't.  
string:to_upper() won't work, as the only encoding it's aware of is  
ISO Latin-1.

As soon as you've got text in anything other than ISO Latin-1, the  
arguments about niceties of being able to do maps/folds/ 
comprehensions on lists pretending to be strings become void. You  
can't reliably iterate over each character in a UTF-8 or UTF-16  
string in a plain list, because they are variable-width encodings.  
Neither could you do it even if your strings were in UTF-32, because  
they may have composed characters, and you'd have to normalize the  
string first... and then you're well on your way to re-implementing  
Unicode in Erlang yourself. Good luck.

Anyway, I've been working on an Erlang Unicode string library based  
on ICU (http://www.icu-project.org/) for the past week. It's coming  
along nicely, and I'll release an alpha version in another week or so.

Erlang is a great language and platform, and non-existent Unicode  
support is probably the biggest drawback it has. I hope we'll get it  
fixed soon.


--
http://12monkeys.co.uk
http://hypernumbers.com


More information about the erlang-questions mailing list