[erlang-questions] The importance of Basic Unicode Understanding in Erlang

Richard O'Keefe ok@REDACTED
Thu Sep 29 00:52:16 CEST 2011


On 29/09/2011, at 10:14 AM, Richard Carlsson wrote:
> - The "good old length and comparison functions" are not broken, they just answer much simpler questions than what you're asking. length(S) tells you how many code points are in string S, no more, no less. Not glyphs, not graphemes, not abstract characters. Code points.

I should point out that the question "how many characters are there" is locale-dependent.

My mother's father, looking at the place name "LJubljana" would have seen 7 letters.
I see 9.  (There are in fact 7 Unicode code points.  Who said one code point couldn't
count as more than one letter?)  Looking at my Father's middle name: "Æneas", I see
5 letters.  (Unicode agrees with me.)  Other people see 6.

This means that there is no such thing as a "unicode" function
	grapheme_length :: String → Integer
but only a function
	grapheme_length :: String × Locale → Integer

This is only the beginning of the problems!

> Similar for comparisons.

And again, similar for comparisons.





More information about the erlang-questions mailing list