Binary, List and Tuple Inequalities (Paradox?)

Fri Oct 25 16:36:59 CEST 2019

Hi all, 

Consider the following inequalities:

(tsdb_1_1@REDACTED)2876> <<0,0,1>> < <<1,0,0>>.
true

As well as:

(tsdb_1_1@REDACTED)2878> [0,0,1] < [1,0,0].
true

The result (true) makes sense — in both cases operands are of the same size, and the first (leftmost) element value of the left operand is less than  first (again, leftmost) element value of the right operand.

However:

(tsdb_1_1@REDACTED)2877> <<0,0,1>> < <<1,0>>.   
true

and

(tsdb_1_1@REDACTED)2879> [0,0,1] < [1,0].  
true

This indicates that the actual length of the operands are not considered, for clearly, three octets cannot be less then two, right?

This becomes even more confusing if you check how tuples are compared given the similar test-case:

(tsdb_1_1@REDACTED)2886> {0,0,1} < {1,0,0}.  
true
(tsdb_1_1@REDACTED)2887> {0,0,1} < {1,0}.  
false

Here, the number of elements in the tuple are clearly taken into the consideration.

Then, if one converts all three pairs of operands to its (external) binary representation, e.g. 

Binary:

(tsdb_1_1@REDACTED)2916> term_to_binary( <<0,0,1>> ).
<<131,109,0,0,0,3,0,0,1>>
(tsdb_1_1@REDACTED)2917> term_to_binary( <<1,0>> ).  
<<131,109,0,0,0,2,1,0>>

List:

tsdb_1_1@REDACTED)2918> term_to_binary( [0,0,1] ).  
<<131,107,0,3,0,0,1>>
(tsdb_1_1@REDACTED)2919> term_to_binary( [1,0] ).  
<<131,107,0,2,1,0>>

Tuple:
(tsdb_1_1@REDACTED)2920> term_to_binary( {0,0,1} ).
<<131,104,3,97,0,97,0,97,1>>
(tsdb_1_1@REDACTED)2921> term_to_binary( {1,0} ).  
<<131,104,2,97,1,97,0>>

One could see that the number of “elements” are known and available for comparison (I’ve highlighted this number using bold, yellow font) for all three situations.

And yet, the different approaches between binary and lists on one side, and tuples, on another, appear to be deliberate as binary and list types are following the approach used by C standard library memcmp function, whereas tuples do not, e.g.

int memcmp(const void *s1, const void *s2, size_t n);
In other words, operands are reduced to some common size ’n', and then compared as if they were equal in length. 
I do understand this restriction in C -- if not for ’n’, the argument sizes are not known, and without it the execution may end up in a catastrophic failure (e.g. segment violation, hmm.. which can still happen anyway if ’n’ is inadequate).

So, THE QUESTION:

Why would it be wrong to consider inequalities for binary() data types the same way we do it for tuples — number of “elements” first and  “numeric values” of individual elements — second?

In my view, this is not only not wrong, but it would be more logical, and this is why.

Look at the following comparison:

(tsdb_1_1@REDACTED)2957> <<0,0,1>> < <<0,0,1,0>>.
true

Well, this make sense, for less-is-less (e.g. three octets are less than four octets) and even if one considers  <<0,0,1>> as a 24-bit integer value, and <<0,0,1,0>> as a 32-bit integer, one may “normalise” the 24-bit integer to its 32-bit equivalent by adding a leading zero, and the comparison expression would still return true:

(tsdb_1_1@REDACTED)2961> <<0,0,0,1>> < <<0,0,1,0>>.   
true

Therefore, as expected, 1 <  256, and, thus, we may be forgiven if we have the same expectation when we write:

(tsdb_1_1@REDACTED)2958> <<1>> < <<0,0,1,0>>. 

Right? Well, not so, because:

(tsdb_1_1@REDACTED)2958> <<1>> < <<0,0,1,0>>.    
false

Thus, "less is more", and therefore 1 appears to be greater-than-or-equal-to  256. Somehow.

This could never happen if one considered a number of elements (in this case number of octets) before attempting to compare the individual element values.

But wait, the "less-is-more" may easily become its opposite — “more is less”, because:

(tsdb_1_1@REDACTED)2973> <<32, 97>> < <<97>>.
true

Yes, this approach may help us to sort a list of, say,  surnames alphabetically, so we can have all the “Ms” before the “Ns”, regardless of how long the surname is, but is this actually logical for two arbitrary binaries? Seeing that we actually care about number of elements when we deal with tuples, why not extend the same approach to arbitrary binary values?

Why do we presume that binaries are given as STRINGS when we compare them? When it comes to lists, the situation is even worse, because lists do, but then do not follow the STRING comparison semantic:

(tsdb_1_1@REDACTED)2991> [32, 97] < [97].  
true

*** But! ***

(tsdb_1_1@REDACTED)2992> [{32, 97}] < [97].
false

(tsdb_1_1@REDACTED)2993> [{32, 97}] < [{97}].
false

And just to be clear, I do not expect Erlang to change. Nor do I care about how lists are handled (well, at least not for the moment).

My main concern is semantic behind comparing two arbitrary binary() operands. For one of my projects I’d like to employ tuple-like semantic (.e.g. number of octets first, values later) when comparing two arbitrary binaries and thus avoid “less-is-more” and “more-is-less” paradoxes.

Is there any reason why I shouldn’t? (Okay, it may be a bit slower, but one may argue this approach may be even a bit faster for a long-enough binaries. But what else?)

At any rate, I would appreciate your reasoning on the topic. What resonates better with you?

Thanks in advance.

V/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191025/d5b79dc0/attachment.htm>