[erlang-questions] Scharfes S (WAS: Erlang 3000?)

David Mercer <>
Wed Nov 19 18:21:02 CET 2008


There are two use cases of case-changing that I can think of.

The first is for typographic reasons.  I don't think a programmatic approach
is ever going to satisfy every possible case automatically.

The other, however, is normalization, probably for comparison
purposes––i.e., case-insensitive comparisons.  In that case, being correct
typographically isn't as important as consistency in how you represent a
character.  Ignoring the Unicode normalization approaches (regarding accents
and combining characters, etc.), a simplistic up-casing algorithm may be all
that is needed.  Someone might object to up-casing the "p" in "pH", for
example, but this is not for typographical purposes, but simply for
normalization purposes.  With that in mind, I would imagine that that German
beta-looking character probably should be normalized to "SS", but bear in
mind that this is coming from someone who doesn't know German and probably
would have pronounced "straße" STRAYB, so I won't be the one to write that
canonical code. :-)

Please let me know if there are use cases I am missing and which enhance
this discussion.

Cheers,

David

> -----Original Message-----
> From:  [mailto:erlang-questions-
> ] On Behalf Of Valentin Micic
> Sent: Wednesday, November 19, 2008 10:08
> To: 'Johnny Billquist'; 'Richard Carlsson'
> Cc: 'erlang-questions Questions'
> Subject: [erlang-questions] Scharfes S (WAS: Erlang 3000?)
> 
> I was under impression that one may write a lowercase ss with the same
> meaning and that "ß" just represents a short way of writing "ss".
> As for pronunciation, I think that single s is indeed pronounced "sz",
> thus
> Suzuki in German sound quite contrary to what one would expect. I might be
> wrong, but scharfes s is used to eliminate "z" sound in "s", hence
> Strasse,
> I mean -- Straße.
> 
> V.
> 
> -----Original Message-----
> From: 
> [mailto:] On Behalf Of Johnny Billquist
> Sent: 19 November 2008 04:05 PM
> To: Richard Carlsson
> Cc: 'erlang-questions Questions'
> Subject: Re: [erlang-questions] Erlang 3000?
> 
> Richard Carlsson wrote:
> > Bengt Kleberg wrote:
> >> the facts of current German orthography are that the
> >> uppercase of ß is "SS"
> >
> > Quite. The lesson should be that even "within the limitations of
> > Latin-1", the idea that you can do case conversion on single
> > code points is wrong. It is an operation that should be applied
> > to strings, not individual characters.
> 
> And I don't agree. You are mixing semantics with syntax, in my mind
> (syntax is probably not the right word here, but I'm no typographer so I
> don't know the correct term, but I hope you understand what I mean).
> There is no uppercase version of ß, so it can't be converted to uppercase.
> The fact that you write SS instead of ß, when you want it in uppercase
> don't mean that it's the same letter, just that it has the same meaning.
> 
> Conversion of a string to uppercase can be regarded in two ways. Either
> you replace each character with it's uppercase version, and characters
> that don't have an uppercase version you leave be.
> 
> Or you can try to convert the string as such to an uppercase version,
> where some letters might need to be replaced by sequences of other
> characters.
> 
> I personally usually are satisfied with the previous, but I guess that's
> anyones choice.
> 
> And I also believe that this is one of the more serious flaws of
> Unicode. It mixes semantics with syntax. So you have, for instance
> several A-ring characters, for use in different type of contexts, but
> that is all artificial and unfortunate.
> It's like in the old days, when you had several different minus signs on
> punched cards, for different uses. Hmm, looking at Unicode, I can see
> that they have reintroduced this ambiguity. You have hyphen-minus
> (U+002D), hyphen (U+2010) and minus (U+2212) and you also have a number
> of different dashes.
> Try to figure out which one you want when you are writing.
> (According to one myth this "problem" actually caused the Mariner 1 to
> fail and self destruct, since the poor Fortran programmer hade used a
> hyphen instead of a minus for a constant. Not sure if it's true or not,
> and the web don't give a sure answer.)
> 
> (Oh, and the A-ring problem is that there is a unit called Ångström,
> which uses the symbol Å. However, in Swedish, A-ring (Å) is a normal,
> plain letter, and the guy Ångström was a Swede, and the unit was named
> after him, with the first letter of his last name as the unit, but with
> Unicode we now need to know if we're writing the letter Å, or the unit
> Å, which is a different codepoint, even though it actually is the same
> letter.
> There are more examples like this, where Unicode mess things up because
> it mix the visual impression of a character with semantic meaning of the
> character.)
> 
> And when I learned German in school many years ago, I was taught that ß
> was more or less the equivalent of sz. :-)
> 
> 	Johnny
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list