[erlang-questions] Strings as Lists
Anthony Shipman
als@REDACTED
Fri Feb 15 20:13:33 CET 2008
On Sat, 16 Feb 2008 04:35:05 am Hasan Veldstra wrote:
>
> This would not work on a string with combining characters, e.g. ü
> represented as u followed by ¨, or a CJKV ideograph.
>
> A lot of glyphs *cannot* be represented by a single Unicode codepoint.
>
> Plain lists or binaries are good enough in two cases:
> 1. You don't need to support anything other than ISO Latin-1 (i.e.
> Western European languages).
> 2. You don't need to do much with the Unicode text apart from simply
> storing it and spitting it back to the user as-is.
How about this:
A string is a list of characters.
A character is one or more Unicode code points. A single code point can be
represented by an integer. Multiple code points can be represented by a
tuple. A list wouldn't be good as flatten would then destroy this structure.
Utility functions convert between UTF8 in binaries and this structure.
--
Anthony Shipman Mamas don't let your babies
als@REDACTED grow up to be outsourced.
More information about the erlang-questions
mailing list