[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Garrett Smith g@REDACTED
Wed Jan 11 22:52:37 CET 2017


On Wed, Jan 11, 2017 at 3:28 PM,  <ok@REDACTED> wrote:
>> Not to start a holy war, but Unicode is not complex, it just has a lot of
>> tables.
>
> Wrong.  Unicode really is hugely complex.
>
> John Wheeler is quoted as saying
>  "If you are not completely confused by quantum mechanics,
>   you do not understand it."
> Now apply this to Unicode:
>   If you think you understand Unicode, you don't.
>
> There are language tags.
> (This is something that was explicitly ruled out of the Unicode
> design, then came back in after all.)
>
> There are variation selectors.
>
> Unicode has support not just for text of varying
> directionality, but for *mixed* directionality,
> with the result that interpreting a Unicode string
> requires maintaining a direction stack, with characters
> like POP DIRECTIONAL FORMATTING.
>
> I am not saying that you have to do this yourself;
> what I am saying is that chopping up a Unicode string
> is in general not a meaningful operation.
>
> Another example: there's a set of characters like this:
> take next two trees and paste them horizontally
> take next three trees and paste them horizontally
> take next two trees and paste them vertically
> take next three trees and paste them vertically
> used for approximating Chinese characters not yet
> supported (and yes these things do nest).
>
> This means that in order to move forward one "character"
> in a string, it is necessary to parse a tree (no, a
> regular expression cannot do this; these things are
> *nested* and regular expressions can't do matching
> brackets).
>
> Did I mention the rules for emoji?  Look at the rules
> for emoji and weep.
>
> I repeat: moving forwards or backwards ONE
> user-oriented character in a Unicode string is HARD.
>
>> And pretty much any modern language has immensely better Unicode
>> support built in than Erlang.
>
> Pretty much any modern language is moving in this area;
> Unicode support in modern Fortran and modern COBOL and
> modern C is not that great.  (ICU4C is not part of any
> C standard.)
>
> One of the things that make Unicode difficult is that things
> keep *changing*.  They try very hard to keep things stable,
> but characters do still from time to time change class
> (what was once an upper case letter may become a sign, for
> one example).
>
> I had code for moving backwards and forwards that took into
> account the difference between base characters and floating
> diacriticals.  It *didn't* take variation selectors into account.
> (Because when I wrote the code there weren't any.)
> Nor did I handle language tags.
> (Because at the time language tags had been ruled out forever.)

I am completely confused by this AND ALSO I do not understand it.



More information about the erlang-questions mailing list