[erlang-questions] Erlang 3000?

Kevin Scaldeferri kevin@REDACTED
Wed Nov 19 16:45:11 CET 2008


On Nov 19, 2008, at 6:04 AM, Johnny Billquist wrote:

> Richard Carlsson wrote:
>> Bengt Kleberg wrote:
>>> the facts of current German orthography are that the
>>> uppercase of ß is "SS"
>>
>> Quite. The lesson should be that even "within the limitations of
>> Latin-1", the idea that you can do case conversion on single
>> code points is wrong. It is an operation that should be applied
>> to strings, not individual characters.
>
> And I don't agree. You are mixing semantics with syntax, in my mind
> (syntax is probably not the right word here, but I'm no typographer  
> so I
> don't know the correct term, but I hope you understand what I mean).
> There is no uppercase version of ß, so it can't be converted to  
> uppercase.
> The fact that you write SS instead of ß, when you want it in uppercase
> don't mean that it's the same letter, just that it has the same  
> meaning.

At least in the context of the German language, ß is nothing more than  
a shorthand for ss.  Similarly ö is shorthand for oe, and so on.  If  
your string functions are not aware of this, then they are wrong.  Of  
course, this exposes that the situation is quite complicated and  
subtle.  Quite possibly these equivalences do not hold in other  
languages.  This means that a function like "to_upper" operating on  
Latin-1 that isn't locale-aware is a wrong interface.


> (Oh, and the A-ring problem is that there is a unit called Ångström,
> which uses the symbol Å. However, in Swedish, A-ring (Å) is a normal,
> plain letter, and the guy Ångström was a Swede, and the unit was named
> after him, with the first letter of his last name as the unit, but  
> with
> Unicode we now need to know if we're writing the letter Å, or the unit
> Å, which is a different codepoint, even though it actually is the same
> letter...)

Well this is really just another example.  If you lowercase the  
Swedish letter Å you should get å.  However, it would be wrong to do  
the same for the symbol for the unit.

>
> There are more examples like this, where Unicode mess things up  
> because
> it mix the visual impression of a character with semantic meaning of  
> the
> character.)

No, it doesn't do this at all.  Unicode defines characters.  It does  
not define glyphs.  It is understood that the mapping of characters to  
glyphs is many-to-many.


>
>
> And when I learned German in school many years ago, I was taught  
> that ß
> was more or less the equivalent of sz. :-)

The _pronunciation_ is somewhat like that.  But the character is  
identical to ss.


-kevin





More information about the erlang-questions mailing list