[erlang-questions] Erlang 3000?

Johnny Billquist bqt@REDACTED
Wed Nov 19 15:04:43 CET 2008


Richard Carlsson wrote:
> Bengt Kleberg wrote:
>> the facts of current German orthography are that the
>> uppercase of ß is "SS"
> 
> Quite. The lesson should be that even "within the limitations of
> Latin-1", the idea that you can do case conversion on single
> code points is wrong. It is an operation that should be applied
> to strings, not individual characters.

And I don't agree. You are mixing semantics with syntax, in my mind 
(syntax is probably not the right word here, but I'm no typographer so I 
don't know the correct term, but I hope you understand what I mean).
There is no uppercase version of ß, so it can't be converted to uppercase.
The fact that you write SS instead of ß, when you want it in uppercase 
don't mean that it's the same letter, just that it has the same meaning.

Conversion of a string to uppercase can be regarded in two ways. Either 
you replace each character with it's uppercase version, and characters 
that don't have an uppercase version you leave be.

Or you can try to convert the string as such to an uppercase version, 
where some letters might need to be replaced by sequences of other 
characters.

I personally usually are satisfied with the previous, but I guess that's 
anyones choice.

And I also believe that this is one of the more serious flaws of 
Unicode. It mixes semantics with syntax. So you have, for instance 
several A-ring characters, for use in different type of contexts, but 
that is all artificial and unfortunate.
It's like in the old days, when you had several different minus signs on 
punched cards, for different uses. Hmm, looking at Unicode, I can see 
that they have reintroduced this ambiguity. You have hyphen-minus 
(U+002D), hyphen (U+2010) and minus (U+2212) and you also have a number 
of different dashes.
Try to figure out which one you want when you are writing.
(According to one myth this "problem" actually caused the Mariner 1 to 
fail and self destruct, since the poor Fortran programmer hade used a 
hyphen instead of a minus for a constant. Not sure if it's true or not, 
and the web don't give a sure answer.)

(Oh, and the A-ring problem is that there is a unit called Ångström, 
which uses the symbol Å. However, in Swedish, A-ring (Å) is a normal, 
plain letter, and the guy Ångström was a Swede, and the unit was named 
after him, with the first letter of his last name as the unit, but with 
Unicode we now need to know if we're writing the letter Å, or the unit 
Å, which is a different codepoint, even though it actually is the same 
letter.
There are more examples like this, where Unicode mess things up because 
it mix the visual impression of a character with semantic meaning of the 
character.)

And when I learned German in school many years ago, I was taught that ß 
was more or less the equivalent of sz. :-)

	Johnny



More information about the erlang-questions mailing list