[erlang-questions] Strings as Lists

Anthony Shipman <>
Fri Feb 15 20:13:33 CET 2008

On Sat, 16 Feb 2008 04:35:05 am Hasan Veldstra wrote:

> This would not work on a string with combining characters, e.g. ü
> represented as u followed by ¨, or a CJKV ideograph.
> A lot of glyphs *cannot* be represented by a single Unicode codepoint.
> Plain lists or binaries are good enough in two cases:
> 1. You don't need to support anything other than ISO Latin-1 (i.e.
> Western European languages).
> 2. You don't need to do much with the Unicode text apart from simply
> storing it and spitting it back to the user as-is.

How about this:
A string is a list of characters.

A character is one or more Unicode code points. A single code point can be 
represented by an integer. Multiple code points can be represented by a 
tuple. A list wouldn't be good as flatten would then destroy this structure.

Utility functions convert between UTF8 in binaries and this structure.

Anthony Shipman                    Mamas don't let your babies 
                   grow up to be outsourced.

More information about the erlang-questions mailing list