[erlang-questions] Erlang 3000?
Wed Nov 19 15:04:43 CET 2008
Richard Carlsson wrote:
> Bengt Kleberg wrote:
>> the facts of current German orthography are that the
>> uppercase of ß is "SS"
> Quite. The lesson should be that even "within the limitations of
> Latin-1", the idea that you can do case conversion on single
> code points is wrong. It is an operation that should be applied
> to strings, not individual characters.
And I don't agree. You are mixing semantics with syntax, in my mind
(syntax is probably not the right word here, but I'm no typographer so I
don't know the correct term, but I hope you understand what I mean).
There is no uppercase version of ß, so it can't be converted to uppercase.
The fact that you write SS instead of ß, when you want it in uppercase
don't mean that it's the same letter, just that it has the same meaning.
Conversion of a string to uppercase can be regarded in two ways. Either
you replace each character with it's uppercase version, and characters
that don't have an uppercase version you leave be.
Or you can try to convert the string as such to an uppercase version,
where some letters might need to be replaced by sequences of other
I personally usually are satisfied with the previous, but I guess that's
And I also believe that this is one of the more serious flaws of
Unicode. It mixes semantics with syntax. So you have, for instance
several A-ring characters, for use in different type of contexts, but
that is all artificial and unfortunate.
It's like in the old days, when you had several different minus signs on
punched cards, for different uses. Hmm, looking at Unicode, I can see
that they have reintroduced this ambiguity. You have hyphen-minus
(U+002D), hyphen (U+2010) and minus (U+2212) and you also have a number
of different dashes.
Try to figure out which one you want when you are writing.
(According to one myth this "problem" actually caused the Mariner 1 to
fail and self destruct, since the poor Fortran programmer hade used a
hyphen instead of a minus for a constant. Not sure if it's true or not,
and the web don't give a sure answer.)
(Oh, and the A-ring problem is that there is a unit called Ångström,
which uses the symbol Å. However, in Swedish, A-ring (Å) is a normal,
plain letter, and the guy Ångström was a Swede, and the unit was named
after him, with the first letter of his last name as the unit, but with
Unicode we now need to know if we're writing the letter Å, or the unit
Å, which is a different codepoint, even though it actually is the same
There are more examples like this, where Unicode mess things up because
it mix the visual impression of a character with semantic meaning of the
And when I learned German in school many years ago, I was taught that ß
was more or less the equivalent of sz. :-)
More information about the erlang-questions