[erlang-questions] Strings as Lists

Peter Lund erlang@REDACTED
Thu Feb 14 12:07:35 CET 2008


The list should contain 6 unicode characters (>255), not its encoding 
into utf-8.

You should apply a fun "to_upper" on that list. Since the OTP libraries 
do not have
to_upper defined for Cyrillic you need to write that yourself.

When storing your Unicode strings,  it is a good idea to convert it to 
utf-8 and
then to a binary. Storing this binary is cheaper than storing the 
Unicode list.
Lists in erlang consumes a lot of space.

/Peter

Hasan Veldstra skrev:
> Erlang currently sucks for working with Unicode, and as a  
> consequence, sucks for working with strings.
>
> This isn't a fault of the language, just the lack of libraries.
>
> Pretending that lists with a bit of DIY are good enough doesn't help.
>
> Yeah, you can load text in any Unicode encoding into an Erlang list  
> with no problems... but there's much more to supporting Unicode than  
> that.
>
> For example, say you've got the string "привет" (which is  
> Russian for "hi") encoded in UTF-8 in list L:
>
> L = [208, 191, 209, 128, 208, 184, 208, 178, 208, 181, 209, 130]
>
> Now say you want to convert it to uppercase. Well, you can't.  
> string:to_upper() won't work, as the only encoding it's aware of is  
> ISO Latin-1.
>
> As soon as you've got text in anything other than ISO Latin-1, the  
> arguments about niceties of being able to do maps/folds/ 
> comprehensions on lists pretending to be strings become void. You  
> can't reliably iterate over each character in a UTF-8 or UTF-16  
> string in a plain list, because they are variable-width encodings.  
> Neither could you do it even if your strings were in UTF-32, because  
> they may have composed characters, and you'd have to normalize the  
> string first... and then you're well on your way to re-implementing  
> Unicode in Erlang yourself. Good luck.
>
> Anyway, I've been working on an Erlang Unicode string library based  
> on ICU (http://www.icu-project.org/) for the past week. It's coming  
> along nicely, and I'll release an alpha version in another week or so.
>
> Erlang is a great language and platform, and non-existent Unicode  
> support is probably the biggest drawback it has. I hope we'll get it  
> fixed soon.
>
>
> --
> http://12monkeys.co.uk
> http://hypernumbers.com
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
>   




More information about the erlang-questions mailing list