[erlang-questions] Strings as Lists
Thu Feb 14 17:24:05 CET 2008
Has anyone here noticed that Erlang ships with a library module for
charset conversion (xmerl_ucs). It may take you some part of the way
On 14 Feb 2008, at 09:37, Hasan Veldstra wrote:
> Erlang currently sucks for working with Unicode, and as a
> consequence, sucks for working with strings.
> This isn't a fault of the language, just the lack of libraries.
> Pretending that lists with a bit of DIY are good enough doesn't help.
> Yeah, you can load text in any Unicode encoding into an Erlang list
> with no problems... but there's much more to supporting Unicode than
> For example, say you've got the string "привет" (which is
> Russian for "hi") encoded in UTF-8 in list L:
> L = [208, 191, 209, 128, 208, 184, 208, 178, 208, 181, 209, 130]
> Now say you want to convert it to uppercase. Well, you can't.
> string:to_upper() won't work, as the only encoding it's aware of is
> ISO Latin-1.
> As soon as you've got text in anything other than ISO Latin-1, the
> arguments about niceties of being able to do maps/folds/
> comprehensions on lists pretending to be strings become void. You
> can't reliably iterate over each character in a UTF-8 or UTF-16
> string in a plain list, because they are variable-width encodings.
> Neither could you do it even if your strings were in UTF-32, because
> they may have composed characters, and you'd have to normalize the
> string first... and then you're well on your way to re-implementing
> Unicode in Erlang yourself. Good luck.
> Anyway, I've been working on an Erlang Unicode string library based
> on ICU (http://www.icu-project.org/) for the past week. It's coming
> along nicely, and I'll release an alpha version in another week or so.
> Erlang is a great language and platform, and non-existent Unicode
> support is probably the biggest drawback it has. I hope we'll get it
> fixed soon.
> erlang-questions mailing list
More information about the erlang-questions