Binary, List and Tuple Inequalities (Paradox?)

Richard O'Keefe raoknz@REDACTED
Fri Nov 1 12:08:24 CET 2019


I note that not only do dictionaries and phone books use different
collation orders,
so (fairly obviously) do different languages, and the rules can be
insanely complex.
English and French, for example, both require up to four passes over a
pair of strings
to resolve the order, and French runs one of those passes backwards.
(According to
the draft standard I read about 10 years ago.)  Things get really
complicated in a
bilingual country like Canada or New Zealand, where you have to
process text in a
mix of two languages.  (I'm not talking about scholarly articles, but
about everyday
newspaper articles.) It gets worse, but enough already.

The bottom line is that there is NO "right" way to compare binaries
and there is NO
"wrong" way either, as long as the axioms of a total order are
respected.  The way
Erlang does it is one good way.  "Compute the base-255 digits of pi to as many
places as necessary, exclusive-or each binary with the appropriate digits, and
compare the results using the existing algorithm" is another perfectly
legal way.

There are basically four requirements for Erlang-style term ordering:
(1) the axioms of a total order must be satisfied
(2) the implementation must be fast enough that programmers are happy to use
term comparison without wondering whether they should program something
(3) the rules must be spelled out so that programmers know what they are getting
(4) if the language uses the same operators for term comparison and arithmetic
comparison, the two must be compatible; if (like Prolog) term comparison (@<)
and arithmetic comparison (<) can disagree.  Just think of the fun you can have
trying to figure out what to do with NaNs.
There is another desideratum, which isn't quite as important:
(+) the "native" representation for text should be ordered in a way that is
unsurprising for programmers but MUST NOT be identical to locale collation
order for two reasons:
  (A) term comparison must distinguish things that locale collation
might identify
  (B) term comparison is for computers and is not meant to be locale-sensitive,
       while collation for users should be locale-sensitive.

It might be worth comparing three different languages I use:
Erlang: term comparison is a total order
Smalltalk:  x = y must return true or false consistently for ANY x and y
 but x < y need not work if x and y are of different kinds and for a
 given x (like nil) need not ever work for any y (even y = x).
Haskell: comparison is supposed to be total (thanks,IEEE) for any type
 that supports it; whether a type supports comparison is detected at
 compile-time; how the comparison is done can be different for every
 type.




On Fri, 1 Nov 2019 at 21:40, Raimo Niskanen <raimo.niskanen@REDACTED> wrote:
>
> On Thu, 2019-10-31 at 20:20 +0400, Antonio Nikishaev wrote:
> > > On 31 Oct 2019, at 14:52, Raimo Niskanen <
> > > raimo.niskanen@REDACTED> wrote:
> > >
> > >
> > > As for now, Elixir uses binaries as their default representation
> > > for
> > > strings, so having them sort as strings is vital for Elixir.
> > > (Unicode has shot a big hole in that argument, but still true for
> > > latin-1)
> >
> > It has not.  UTF-8 does preserve order.
>
> Ah, yes!  That clever property of UTF-8 slipped my mind.
>
> So in both Erlang and Elixir; codepoint lists sorts the same way as
> their corresponding UTF-8 binaries (which is the default format for
> binaris as strings in both Erlang and Elixir).
>
> Not true for other Unicode binary encodings, though...
>
> / Raimo
>
>
>
> >
> >
> >
> >
> > -- lelf
> >



More information about the erlang-questions mailing list