[erlang-questions] Strings as Lists

Anthony Shipman als@REDACTED
Fri Feb 15 20:13:33 CET 2008


On Sat, 16 Feb 2008 04:35:05 am Hasan Veldstra wrote:

>
> This would not work on a string with combining characters, e.g. ü
> represented as u followed by ¨, or a CJKV ideograph.
>
> A lot of glyphs *cannot* be represented by a single Unicode codepoint.
>
> Plain lists or binaries are good enough in two cases:
> 1. You don't need to support anything other than ISO Latin-1 (i.e.
> Western European languages).
> 2. You don't need to do much with the Unicode text apart from simply
> storing it and spitting it back to the user as-is.

How about this:
A string is a list of characters.

A character is one or more Unicode code points. A single code point can be 
represented by an integer. Multiple code points can be represented by a 
tuple. A list wouldn't be good as flatten would then destroy this structure.

Utility functions convert between UTF8 in binaries and this structure.

-- 
Anthony Shipman                    Mamas don't let your babies 
als@REDACTED                   grow up to be outsourced.



More information about the erlang-questions mailing list