Binary, List and Tuple Inequalities (Paradox?)

Fri Oct 25 17:35:20 CEST 2019

To me alphabetical ordering of lists (as legacy string representation) and binaries (as modern string representation) is natural, so it is expected that "aa" < "b" < "bb".

Not sure about binaries ordering which first checks the binary size, like "b" < "aa" < "bb" - I personally cannot find a suitable application for such ordering.

And about the lists. Their length is not known as for binaries. It can only be calculated with linear complexity. If you have a list 'aaa..a' of 50 elements and a list 'bbb..b' of 49, you'd have to make 49*2 iterations to determine which one is less. With the current implementation it is just 1 comparison.

On 26 October 2019 1:36:59 am AEDT, Valentin Micic <v@REDACTED> wrote:
>Hi all, 
>
>Consider the following inequalities:
>
>(tsdb_1_1@REDACTED)2876> <<0,0,1>> < <<1,0,0>>.
>true
>
>As well as:
>
>(tsdb_1_1@REDACTED)2878> [0,0,1] < [1,0,0].
>true
>
>The result (true) makes sense — in both cases operands are of the same
>size, and the first (leftmost) element value of the left operand is
>less than  first (again, leftmost) element value of the right operand.
>
>However:
>
>(tsdb_1_1@REDACTED)2877> <<0,0,1>> < <<1,0>>.   
>true
>
>and
>
>(tsdb_1_1@REDACTED)2879> [0,0,1] < [1,0].  
>true
>
>This indicates that the actual length of the operands are not
>considered, for clearly, three octets cannot be less then two, right?
>
>This becomes even more confusing if you check how tuples are compared
>given the similar test-case:
>
>(tsdb_1_1@REDACTED)2886> {0,0,1} < {1,0,0}.  
>true
>(tsdb_1_1@REDACTED)2887> {0,0,1} < {1,0}.  
>false
>
>Here, the number of elements in the tuple are clearly taken into the
>consideration.
>
>Then, if one converts all three pairs of operands to its (external)
>binary representation, e.g. 
>
>Binary:
>
>(tsdb_1_1@REDACTED)2916> term_to_binary( <<0,0,1>> ).
><<131,109,0,0,0,3,0,0,1>>
>(tsdb_1_1@REDACTED)2917> term_to_binary( <<1,0>> ).  
><<131,109,0,0,0,2,1,0>>
>
>List:
>
>tsdb_1_1@REDACTED)2918> term_to_binary( [0,0,1] ).  
><<131,107,0,3,0,0,1>>
>(tsdb_1_1@REDACTED)2919> term_to_binary( [1,0] ).  
><<131,107,0,2,1,0>>
>
>Tuple:
>(tsdb_1_1@REDACTED)2920> term_to_binary( {0,0,1} ).
><<131,104,3,97,0,97,0,97,1>>
>(tsdb_1_1@REDACTED)2921> term_to_binary( {1,0} ).  
><<131,104,2,97,1,97,0>>
>
>One could see that the number of “elements” are known and available for
>comparison (I’ve highlighted this number using bold, yellow font) for
>all three situations.
>
>And yet, the different approaches between binary and lists on one side,
>and tuples, on another, appear to be deliberate as binary and list
>types are following the approach used by C standard library memcmp
>function, whereas tuples do not, e.g.
>
>int memcmp(const void *s1, const void *s2, size_t n);
>In other words, operands are reduced to some common size ’n', and then
>compared as if they were equal in length. 
>I do understand this restriction in C -- if not for ’n’, the argument
>sizes are not known, and without it the execution may end up in a
>catastrophic failure (e.g. segment violation, hmm.. which can still
>happen anyway if ’n’ is inadequate).
>
>So, THE QUESTION:
>
>Why would it be wrong to consider inequalities for binary() data types
>the same way we do it for tuples — number of “elements” first and 
>“numeric values” of individual elements — second?
>
>In my view, this is not only not wrong, but it would be more logical,
>and this is why.
>
>Look at the following comparison:
>
>(tsdb_1_1@REDACTED)2957> <<0,0,1>> < <<0,0,1,0>>.
>true
>
>
>Well, this make sense, for less-is-less (e.g. three octets are less
>than four octets) and even if one considers  <<0,0,1>> as a 24-bit
>integer value, and <<0,0,1,0>> as a 32-bit integer, one may “normalise”
>the 24-bit integer to its 32-bit equivalent by adding a leading zero,
>and the comparison expression would still return true:
>
>(tsdb_1_1@REDACTED)2961> <<0,0,0,1>> < <<0,0,1,0>>.   
>true
>
>Therefore, as expected, 1 <  256, and, thus, we may be forgiven if we
>have the same expectation when we write:
>
>(tsdb_1_1@REDACTED)2958> <<1>> < <<0,0,1,0>>. 
> 
>Right? Well, not so, because:
>
>(tsdb_1_1@REDACTED)2958> <<1>> < <<0,0,1,0>>.    
>false
>
>Thus, "less is more", and therefore 1 appears to be
>greater-than-or-equal-to  256. Somehow.
> 
>This could never happen if one considered a number of elements (in this
>case number of octets) before attempting to compare the individual
>element values.
>
>But wait, the "less-is-more" may easily become its opposite — “more is
>less”, because:
>
>(tsdb_1_1@REDACTED)2973> <<32, 97>> < <<97>>.
>true
>
>Yes, this approach may help us to sort a list of, say,  surnames
>alphabetically, so we can have all the “Ms” before the “Ns”, regardless
>of how long the surname is, but is this actually logical for two
>arbitrary binaries? Seeing that we actually care about number of
>elements when we deal with tuples, why not extend the same approach to
>arbitrary binary values?
>
>Why do we presume that binaries are given as STRINGS when we compare
>them? When it comes to lists, the situation is even worse, because
>lists do, but then do not follow the STRING comparison semantic:
>
>(tsdb_1_1@REDACTED)2991> [32, 97] < [97].  
>true
>
>*** But! ***
>
>(tsdb_1_1@REDACTED)2992> [{32, 97}] < [97].
>false
>
>(tsdb_1_1@REDACTED)2993> [{32, 97}] < [{97}].
>false
>
>And just to be clear, I do not expect Erlang to change. Nor do I care
>about how lists are handled (well, at least not for the moment).
>
>My main concern is semantic behind comparing two arbitrary binary()
>operands. For one of my projects I’d like to employ tuple-like semantic
>(.e.g. number of octets first, values later) when comparing two
>arbitrary binaries and thus avoid “less-is-more” and “more-is-less”
>paradoxes.
>
>Is there any reason why I shouldn’t? (Okay, it may be a bit slower, but
>one may argue this approach may be even a bit faster for a long-enough
>binaries. But what else?)
>
>At any rate, I would appreciate your reasoning on the topic. What
>resonates better with you?
>
>
>Thanks in advance.
>
>V/

-- 
Kind regards,
Dmitry Belyaev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191026/75d2d1bf/attachment.htm>