[erlang-questions] Erlang 3000?
Wed Nov 19 16:45:11 CET 2008
On Nov 19, 2008, at 6:04 AM, Johnny Billquist wrote:
> Richard Carlsson wrote:
>> Bengt Kleberg wrote:
>>> the facts of current German orthography are that the
>>> uppercase of ß is "SS"
>> Quite. The lesson should be that even "within the limitations of
>> Latin-1", the idea that you can do case conversion on single
>> code points is wrong. It is an operation that should be applied
>> to strings, not individual characters.
> And I don't agree. You are mixing semantics with syntax, in my mind
> (syntax is probably not the right word here, but I'm no typographer
> so I
> don't know the correct term, but I hope you understand what I mean).
> There is no uppercase version of ß, so it can't be converted to
> The fact that you write SS instead of ß, when you want it in uppercase
> don't mean that it's the same letter, just that it has the same
At least in the context of the German language, ß is nothing more than
a shorthand for ss. Similarly ö is shorthand for oe, and so on. If
your string functions are not aware of this, then they are wrong. Of
course, this exposes that the situation is quite complicated and
subtle. Quite possibly these equivalences do not hold in other
languages. This means that a function like "to_upper" operating on
Latin-1 that isn't locale-aware is a wrong interface.
> (Oh, and the A-ring problem is that there is a unit called Ångström,
> which uses the symbol Å. However, in Swedish, A-ring (Å) is a normal,
> plain letter, and the guy Ångström was a Swede, and the unit was named
> after him, with the first letter of his last name as the unit, but
> Unicode we now need to know if we're writing the letter Å, or the unit
> Å, which is a different codepoint, even though it actually is the same
Well this is really just another example. If you lowercase the
Swedish letter Å you should get å. However, it would be wrong to do
the same for the symbol for the unit.
> There are more examples like this, where Unicode mess things up
> it mix the visual impression of a character with semantic meaning of
No, it doesn't do this at all. Unicode defines characters. It does
not define glyphs. It is understood that the mapping of characters to
glyphs is many-to-many.
> And when I learned German in school many years ago, I was taught
> that ß
> was more or less the equivalent of sz. :-)
The _pronunciation_ is somewhat like that. But the character is
identical to ss.
More information about the erlang-questions