Strings (was: Re: are Mnesia tables immutable?)
Tue Jun 27 07:21:54 CEST 2006
On 6/26/06, Richard A. O'Keefe <> wrote:
> "Ryan Rawson" <> wrote:
> There is a general perception that Erlang is no good at strings.
> That perception is definitely mistaken.
It's called "education" I believe. The best part about this list, is
it becomes archived online, and google when searching for 'erlang
<some problem>' it returns results from this list archive very often.
Meaning your replies create future knowledge for young'uns. :-)
> Part of the issue is the whole 'lists of integers' and people
> freak on the memory requirements.
> It's just like the way that the credulous swallow the Da Vinci Code.
> They just don't check for themselves. (Brown has Langdon go into
> raptures about ((1+sqrt(5))/2) and gets many of his facts so far wrong
> that you'd think he was a whole government department. But he does
> actually _tell_ his students to measure their own
> (head-floor)/(navel-floor), and any readers who _did_ that almost
> surely found their own ratios weren't even close to phi. And for bees
> he is out by thousands.)
> The other part I think is the Unicode support.
> Care to speak to that?
> Ah, Unicode.
> I've struggled to provide Unicode support in one language and
> written a draft library proposal for another.
> It's quite frighteningly hard. (Interesting point: the C99 standard
> does _not_ provide enough information about the current locale to
> implement POSIX regular expressions, and POSIX itself doesn't provide
> access to the information you need either. Guess how come I found that
> out?) I'm not even sure what "support" for Unicode language tags (the
> absence of which was one of the core design features originally, so it
> defies belief that they added them) would begin to look like.
> Simply telling when two sequence of Unicode codepoints represent the
> "same" string is the very reverse of trivial. Not that there is a
> single definition. Unicode has four "normal forms", so there are
> five different definitions of "same", counting the trivial one, and
> _not_ counting differences between versions of Unicode (I see Unicode 5
> is on the horizon already). And there isn't even a normal form with
> the property that is_nf(Xs) and is_nf(Ys) => is_nf(Xs ++ Ys). Yep,
> concatenating two normalised strings (any of the four definitions) can
> give you a result that is _not_ normalised.
> Mind you, Unicode support in C and C++ is extremely weak too, unless
> you use the Taligent/IBM International Components for Unicode (icu4c,
> icu4j, see icu.sourceforge.net).
Then perhaps icu4c should be used as the basis of providing Erlang
unicode support? I'm not really a unicode expert, so I don't know
what is involved, but if a library provides good support, then it can
be the core basis of a Erlang library, no?
More information about the erlang-questions