[erlang-questions] Strings as Lists
Hasan Veldstra
hasan.veldstra@REDACTED
Thu Feb 14 10:37:28 CET 2008
Erlang currently sucks for working with Unicode, and as a
consequence, sucks for working with strings.
This isn't a fault of the language, just the lack of libraries.
Pretending that lists with a bit of DIY are good enough doesn't help.
Yeah, you can load text in any Unicode encoding into an Erlang list
with no problems... but there's much more to supporting Unicode than
that.
For example, say you've got the string "привет" (which is
Russian for "hi") encoded in UTF-8 in list L:
L = [208, 191, 209, 128, 208, 184, 208, 178, 208, 181, 209, 130]
Now say you want to convert it to uppercase. Well, you can't.
string:to_upper() won't work, as the only encoding it's aware of is
ISO Latin-1.
As soon as you've got text in anything other than ISO Latin-1, the
arguments about niceties of being able to do maps/folds/
comprehensions on lists pretending to be strings become void. You
can't reliably iterate over each character in a UTF-8 or UTF-16
string in a plain list, because they are variable-width encodings.
Neither could you do it even if your strings were in UTF-32, because
they may have composed characters, and you'd have to normalize the
string first... and then you're well on your way to re-implementing
Unicode in Erlang yourself. Good luck.
Anyway, I've been working on an Erlang Unicode string library based
on ICU (http://www.icu-project.org/) for the past week. It's coming
along nicely, and I'll release an alpha version in another week or so.
Erlang is a great language and platform, and non-existent Unicode
support is probably the biggest drawback it has. I hope we'll get it
fixed soon.
--
http://12monkeys.co.uk
http://hypernumbers.com
More information about the erlang-questions
mailing list