Binary, List and Tuple Inequalities (Paradox?)

Raimo Niskanen raimo.niskanen@REDACTED
Thu Oct 31 11:52:47 CET 2019


On Wed, 2019-10-30 at 15:34 +0200, Valentin Micic wrote:
> > On 28 Oct 2019, at 10:02, Raimo Niskanen <
> > raimo.niskanen@REDACTED> wrote:
> > 
> > I'd like to defend the current term order of lists, tuples and
> > binaries.
> > 
> > Lists are compared a'la string comparison, which avoids having to
> > traverse the whole list when comparing.  Binaries are compared as
> > lists
> > since both are used for handling strings, and also to be compared
> > like
> > memcmp does it.
> > 
> > Tuples are used for general complex terms.  They can be compared
> > size
> > first whithout having to be fully traversed, as opposed to lists,
> > which
> > is useful in other contexts.  You can thereby choose your data
> > structure to select comparision.
> > -- 
> > Raimo Niskanen
> > 
> 
> Thanks Raimo,
> 
> Your argument seems to have one…well, misconception. I don’t think it
> would be correct to imply that Binaries are used exclusively for
> handling strings, but rather, one of their uses may be that.

I did not mean to imply that binaries are used exclusevily for handling
strings.  I merely said that both lists and binaries are used for
handling strings, i.e that strings is one use case.

Although I confess that I think it is an important use case, if not
even a very important use case.

When binaries were introduced it was already recognized that having
strings as lists is a clumsy representation, especially with 64-bit
then on the way with almost double the space waste.  So one could
foresee that binaries would be used for storing strings.

And to not have strings as binaries sort like strings as lists would be
a really bad idea for that use case.

As for now, Elixir uses binaries as their default representation for
strings, so having them sort as strings is vital for Elixir.
(Unicode has shot a big hole in that argument, but still true for
latin-1)

> 
> For example, here's a syntax that is not anything “string-like”:
> 
> 
> (tsdb_1_1@REDACTED)3593> <<1:16>> < <<2:16>>. 
> true
> (tsdb_1_1@REDACTED)3594> <<50:16>> < <<36:16>>.
> false
> 
:
:
> As I said in my previous email, I do not expect Erlang to change, and
> for my "own intents and purposes” I am still considering if:
> 
> (tsdb_1_1@REDACTED)3626> <<1,3>> < <<1,2,3>>.
> false
> 
> should be given more credence than, say TRUE... if nothing else,
> because two octets are less than three octets.
> 
> In other words, if a three-element-tuple, regardless of its content,
>  could never be less than any given two-elements-tuple, why shouldn't
> the same hold for Binaries?

If you want to compare terms as integers, then represent them as
integers!  Erlang bignums can be big enough for most purposes.

Erlang does not have operator overloading or subclassing, and has a
global term order.  So exactly one sort order has to be chosen for a
type, and then a guess has to be made of which one is "best" or "most
useful".

The one chosen for binaries was string sort order, because that is how
C compares binaries, and strings seemed to be a very important use case
for binaries, which Elixir kind of has proven after the fact.

The tuple sort order could have been chosen for binaries, but as I said
above that would be bad for the string use case.

Also, regarding "useful"; tuple sort order for binaries can easily be
implemented by first comparing sizes then comparing content, whilst
should you have tulpe sort order and want to implement list sort order
you would have to loop byte by byte.  Therefore the list sort order can
be regarded as more useful since it can be a building block for
efficient tuple sort order...

Best regards
/ Raimo Niskanen

> 
> Kind regards
> 
> V/
> 
> 


More information about the erlang-questions mailing list