[erlang-questions] Strings as Lists

Hasan Veldstra hasan.veldstra@REDACTED
Sat Feb 16 00:02:31 CET 2008


David,

> My question was in regards to reversing strings, not lists of  
> characters.
> Specifically, Hasan Veldstra's complaint that representing strings  
> as lists
> doesn't work when you use lists:reverse to reverse them:

That wasn't what I said. I gave an example of when a string reversal  
would fail as a consequence of treating Unicode codepoints as  
characters ("characters" from user's point of view, not how Unicode  
defines "characters").

>> This would not work on a string with combining characters, e.g. ü
>> represented as u followed by ¨, or a CJKV ideograph.
>>
>> A lot of glyphs *cannot* be represented by a single Unicode  
>> codepoint.
>
> Your example is a case on "unreversing" a reversal done during the  
> up-casing
> process.

Sorry, I'm not following you here. I didn't even mention upcasing in  
my last message.

> My guess is that in the "ü represented as u followed by ¨" case,  
> it would
> work just right: the "u" would be up-cased to "U", and the "¨"  
> would follow
> capital "U" (following the list:reverse to unreverse the list).

Yes, maybe this would work, thanks to Erlang's awareness of Western  
European scripts. How would you convert this string to uppercase in  
Erlang though: "Καλημέρα κόσμε"? With libraries that are  
available now, it's impossible.

How about doing case-insensitive comparisons of strings containing  
Russian text? Or even doing a case-insensitive comparison of "straße"  
and "STRASSE"? Again, no library support.

Or how about comparing two strings that look identical when printed,  
but one of them contains the pre-composed "ü" character, while the  
other contains "u" followed by "¨"? Again, you can't do this and  
similar comparisons reliably using plain lists. Unless you implement  
Unicode from scratch yourself, of course.

> I don't think up-casing a CJKV ideograph makes any sense

I know little about East Asian scripts, and I don't know if they have  
the uppercase/lowercase distinction, but I never said you'd want to  
upcase a CJKV ideograph.

> So the question goes back to Mr. Veldstra (or anyone) as to why you  
> would
> want to reverse a Unicode string

I don't know. String reversal was a convenient example for the point  
I was trying to make.




--
http://12monkeys.co.uk
http://hypernumbers.com


More information about the erlang-questions mailing list