There are two use cases of case-changing that I can think of.

The first is for typographic reasons.  I don't think a programmatic approach
is ever going to satisfy every possible case automatically.

The other, however, is normalization, probably for comparison
purposes––i.e., case-insensitive comparisons.  In that case, being correct
typographically isn't as important as consistency in how you represent a
character.  Ignoring the Unicode normalization approaches (regarding accents
and combining characters, etc.), a simplistic up-casing algorithm may be all
that is needed.  Someone might object to up-casing the "p" in "pH", for
example, but this is not for typographical purposes, but simply for
normalization purposes.  With that in mind, I would imagine that that German
beta-looking character probably should be normalized to "SS", but bear in
mind that this is coming from someone who doesn't know German and probably
would have pronounced "straße" STRAYB, so I won't be the one to write that
canonical code. :-)

Please let me know if there are use cases I am missing and which enhance
this discussion.



> Richard Carlsson wrote:
> > Bengt Kleberg wrote:
> >> the facts of current German orthography are that the
> >> uppercase of ß is "SS"
> >
> > Quite. The lesson should be that even "within the limitations of
> > Latin-1", the idea that you can do case conversion on single
> > code points is wrong. It is an operation that should be applied
> > to strings, not individual characters.
> And I don't agree. You are mixing semantics with syntax, in my mind
> (syntax is probably not the right word here, but I'm no typographer so I
> don't know the correct term, but I hope you understand what I mean).
> There is no uppercase version of ß, so it can't be converted to uppercase.
> The fact that you write SS instead of ß, when you want it in uppercase
> don't mean that it's the same letter, just that it has the same meaning.
> Conversion of a string to uppercase can be regarded in two ways. Either
> you replace each character with it's uppercase version, and characters
> that don't have an uppercase version you leave be.
> Or you can try to convert the string as such to an uppercase version,
> where some letters might need to be replaced by sequences of other
> characters.
> I personally usually are satisfied with the previous, but I guess that's
> anyones choice.
> And I also believe that this is one of the more serious flaws of
> Unicode. It mixes semantics with syntax. So you have, for instance
> several A-ring characters, for use in different type of contexts, but
> that is all artificial and unfortunate.
> It's like in the old days, when you had several different minus signs on
> punched cards, for different uses. Hmm, looking at Unicode, I can see
> that they have reintroduced this ambiguity. You have hyphen-minus
> (U+002D), hyphen (U+2010) and minus (U+2212) and you also have a number
> of different dashes.
> Try to figure out which one you want when you are writing.
> (According to one myth this "problem" actually caused the Mariner 1 to
> fail and self destruct, since the poor Fortran programmer hade used a
> hyphen instead of a minus for a constant. Not sure if it's true or not,
> and the web don't give a sure answer.)
> (Oh, and the A-ring problem is that there is a unit called Ångström,
> which uses the symbol Å. However, in Swedish, A-ring (Å) is a normal,
> plain letter, and the guy Ångström was a Swede, and the unit was named
> after him, with the first letter of his last name as the unit, but with
> Unicode we now need to know if we're writing the letter Å, or the unit
> Å, which is a different codepoint, even though it actually is the same
> letter.
> There are more examples like this, where Unicode mess things up because
> it mix the visual impression of a character with semantic meaning of the
> character.)
> And when I learned German in school many years ago, I was taught that ß
> was more or less the equivalent of sz. :-)
> 	Johnny
