Binary, List and Tuple Inequalities (Paradox?)

Thu Oct 31 00:13:17 CET 2019

Hi Valentin,

I think you should acknowledge that the ordering of terms as defined by the
Erlang language does not aim to fit any semantic interpretation of the data
you may have. The sole purpose of the ordering is to be well defined, so
you can create e.g. a binary search tree that holds whatever data. For that
purpose any crazy ordering like 1 < false < 0 < [1, 2, foobar] < <0.12.34>
< 42 < [] < 30 < baz ... would work fine, as long as the place of every
term in this sequence is well defined. There's only one constraint (apart
from not violating the mathematical properties of a strict ordering
relation), and that is to make comparison of terms reasonably efficient (my
above example would fail this check).

So you feel <<1:16>> should be less than <<2:24>>? That's because you
interpret <<0,1>> and <<0,0,2>> as variable length encoding of integers. If
you have this use case, than write a custom comparison function and sort
your binaries with that. I may have a use case where the same two binaries
mean operation #1 (with no arguments) and operation #0 (with a single
argument 2), so I would have to write a different comparison function (but
as lucky as I am, I may just use erlang:'<'/2 this time, phew!).

But if you would like to know why I find the built-in ordering of terms
logical, it is because both lists and binaries are usually used to
represent variable sized data, while tuples are for fixed size data. I mean
that {1,2} and {1,2,3} typically describe semantically different things,
e.g. a 2D coordinate and a semver version number. Their relative ordering
is as irrelevant for most applications as the ordering of {1,2} and
<0.0.1>. Ordering by tuple size first makes some sense because it will keep
types together: just like all integers are less than atoms, all 2D
coordinates are less than version numbers.

Binaries may represent fixed size data (e.g. a packet header), but than it
doesn't matter whether you order lexicographically or size first, because
all binaries in your domain will be the same size. Most often however
binaries represent variable sized data, such as payload in a packet. And
for variable sized data the human convention is lexicographical ordering. A
printed dictionary may sort words by length first, and it would be a
(mostly) usable dictionary, but that's just not how things are by
convention.

Cheers,
Dániel

P.S.: By the way, this is why I don't like passing fixed length lists as
arguments for init/1 in gen_* callbacks (such as init([Some, Args, Here])
-> ...). These should be tuples instead, they are not variable length!

On Wed, 30 Oct 2019 at 14:35, Valentin Micic <v@REDACTED> wrote:

>
> On 28 Oct 2019, at 10:02, Raimo Niskanen <raimo.niskanen@REDACTED>
> wrote:
>
> I'd like to defend the current term order of lists, tuples and
> binaries.
>
> Lists are compared a'la string comparison, which avoids having to
> traverse the whole list when comparing.  Binaries are compared as lists
> since both are used for handling strings, and also to be compared like
> memcmp does it.
>
> Tuples are used for general complex terms.  They can be compared size
> first whithout having to be fully traversed, as opposed to lists, which
> is useful in other contexts.  You can thereby choose your data
> structure to select comparision.
> --
> Raimo Niskanen
>
>
> Thanks Raimo,
>
> Your argument seems to have one…well, misconception. I don’t think it
> would be correct to imply that Binaries are used exclusively for handling
> strings, but rather, one of their uses may be that.
>
> For example, here's a syntax that is not anything “string-like”:
>
>
> (tsdb_1_1@REDACTED)3593> <<1:16>> < <<2:16>>.
> true
> (tsdb_1_1@REDACTED)3594> <<50:16>> < <<36:16>>.
> false
>
>
> As you could see, this pretty much behaves the way we would expect when we
> compare two integer values, e.g.
>
> 1 is less than 2, and so is <<1:16>> < <<2:16>> Technically, we are not
> talking about strings here, but, rather two 16-bit integers.
>
> I can also say that:
>
> (tsdb_1_1@REDACTED)3597> 1 < 000002.
> true
>
> When we write:
>
> (tsdb_1_1@REDACTED)3593> <<1:16>>.
>
> It would be wrong to pretend that we’re actually talking about a strings
> (e.g. in alphanumeric sense). This clearly means that the integer value of
> 1 stored using Big-Endian encoding (e.g. network byte order).
>
> Thus, when we write:  <<2:16>> we get <<0,2>>. When we write: <<2:24>> we
> get <<0,0,2>>... these values are *not* intended to be strings, but
> integers.
> So, when we add leading zeroes, we do not change the integer value.
>
> So why is then:
>
> (tsdb_1_1@REDACTED)3600> <<1:16>> < <<2:24>>.
> false
>
> First, we’re clearly use integer syntax to encode integer values, then we
> have the first integer value encoded using 16-bits, and the second integer
> value encoded using 24-bits.
> It just happens so, that 16-bits is used to encode the value of 1, and
> 24-bits to encode the value of two.
>
> Thus, since 16-bits are less then 24-bits (in length), but also 1 is less
> than 2, one may expect this to yield TRUE. Yet somehow,  two octets are NOT
> LESS than there, nor 1 is NOT LESS than 2!
>
> I think this cannot pass the "red-face test”, and thus does not deserve
> defending.
>
> Contrast this with the way tuples are handled:
>
> (tsdb_1_1@REDACTED)3666> {1,1} < {1,2}.
> true
> (tsdb_1_1@REDACTED)3667> {1,3} < {1,2}.
> false
> (tsdb_1_1@REDACTED)3668> {1,3} < {1,2,3}.
> true
>
> Considering that Binaries may be used to encode ANYTHING, shouldn’t they
> be handled the same way as tuples instead of:
>
>
> (tsdb_1_1@REDACTED)3624> <<1,1>> < <<1,2>>.
> true
> (tsdb_1_1@REDACTED)3625> <<1,3>> < <<1,2>>.
> false
> (tsdb_1_1@REDACTED)3626> <<1,3>> < <<1,2,3>>.
> false
>
>
> As I said in my previous email, I do not expect Erlang to change, and for
> my "own intents and purposes” I am still considering if:
>
> (tsdb_1_1@REDACTED)3626> <<1,3>> < <<1,2,3>>.
> false
>
> should be given more credence than, say TRUE... if nothing else, because
> two octets are less than three octets.
>
> In other words, if a three-element-tuple, regardless of its content,
>  could never be less than any given two-elements-tuple, why shouldn't the
> same hold for Binaries?
>
> Kind regards
>
> V/
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191031/eacf6b46/attachment.htm>