Binary, List and Tuple Inequalities (Paradox?)

Valentin Micic v@REDACTED
Thu Oct 31 22:03:40 CET 2019



> On 31 Oct 2019, at 21:43, Richard O'Keefe <raoknz@REDACTED> wrote:
> 
> You can represent all sorts of information as binaries,
> but the binary doesn't know.  It may be that you mean
> <<65,66,67>> to represent "ABC"; it may be that you
> mean to represent 4276803; it may be that you mean
> to represent 'play A below middle c for 66 milliseconds
> on instrument 67'.  Your intentions are not part of the
> binary.  So there is no way to have a sort order for
> binaries (or anything else) that perfectly suits all use
> cases.  Even for strings, English dictionaries and
> telephone books use *different* orders and neither of
> them matches strcmp().
> 
> For Prolog, I managed in 1984 to come up with a
> justification for the ordering of terms of different kinds
> (coherence between @< and subtypes) which the ISO
> committee chose to ignore.
> 
> The best we can hope for in a "global" term ordering is
> that it satisfy the axioms of a total order.  That's a useful
> thing to have.
> 
> Want something else?  Program it.

Yes, indeed, and I’ve implied that much — I do want to program it, but I also wanted to know what people think  — if one aspires to program for others, it seems reasonable to consider dominant logic on the issue.

I started by thinking that less cannot be more, e.g. two octets cannot be less than three.

Now I am increasingly convinced that one should never compare Binaries that are not of the same size.
However, I am not so sure what to do if, or rather when, one has to (compare binaries of unequal sizes).

So here’s the question:

Presuming that one does not know anything about semantic given by data stored in a Binary, but one knows about the size of the binaries being compared, wouldn’t it make sense then to consider that two octets should be less than three?

Wouldn’t this approach fit the most general case (independent of what Elixir people do).

V/


> 
> On Thu, 31 Oct 2019 at 23:53, Raimo Niskanen
> <raimo.niskanen@REDACTED> wrote:
>> 
>> On Wed, 2019-10-30 at 15:34 +0200, Valentin Micic wrote:
>>>> On 28 Oct 2019, at 10:02, Raimo Niskanen <
>>>> raimo.niskanen@REDACTED> wrote:
>>>> 
>>>> I'd like to defend the current term order of lists, tuples and
>>>> binaries.
>>>> 
>>>> Lists are compared a'la string comparison, which avoids having to
>>>> traverse the whole list when comparing.  Binaries are compared as
>>>> lists
>>>> since both are used for handling strings, and also to be compared
>>>> like
>>>> memcmp does it.
>>>> 
>>>> Tuples are used for general complex terms.  They can be compared
>>>> size
>>>> first whithout having to be fully traversed, as opposed to lists,
>>>> which
>>>> is useful in other contexts.  You can thereby choose your data
>>>> structure to select comparision.
>>>> --
>>>> Raimo Niskanen
>>>> 
>>> 
>>> Thanks Raimo,
>>> 
>>> Your argument seems to have one…well, misconception. I don’t think it
>>> would be correct to imply that Binaries are used exclusively for
>>> handling strings, but rather, one of their uses may be that.
>> 
>> I did not mean to imply that binaries are used exclusevily for handling
>> strings.  I merely said that both lists and binaries are used for
>> handling strings, i.e that strings is one use case.
>> 
>> Although I confess that I think it is an important use case, if not
>> even a very important use case.
>> 
>> When binaries were introduced it was already recognized that having
>> strings as lists is a clumsy representation, especially with 64-bit
>> then on the way with almost double the space waste.  So one could
>> foresee that binaries would be used for storing strings.
>> 
>> And to not have strings as binaries sort like strings as lists would be
>> a really bad idea for that use case.
>> 
>> As for now, Elixir uses binaries as their default representation for
>> strings, so having them sort as strings is vital for Elixir.
>> (Unicode has shot a big hole in that argument, but still true for
>> latin-1)
>> 
>>> 
>>> For example, here's a syntax that is not anything “string-like”:
>>> 
>>> 
>>> (tsdb_1_1@REDACTED)3593> <<1:16>> < <<2:16>>.
>>> true
>>> (tsdb_1_1@REDACTED)3594> <<50:16>> < <<36:16>>.
>>> false
>>> 
>> :
>> :
>>> As I said in my previous email, I do not expect Erlang to change, and
>>> for my "own intents and purposes” I am still considering if:
>>> 
>>> (tsdb_1_1@REDACTED)3626> <<1,3>> < <<1,2,3>>.
>>> false
>>> 
>>> should be given more credence than, say TRUE... if nothing else,
>>> because two octets are less than three octets.
>>> 
>>> In other words, if a three-element-tuple, regardless of its content,
>>> could never be less than any given two-elements-tuple, why shouldn't
>>> the same hold for Binaries?
>> 
>> If you want to compare terms as integers, then represent them as
>> integers!  Erlang bignums can be big enough for most purposes.
>> 
>> Erlang does not have operator overloading or subclassing, and has a
>> global term order.  So exactly one sort order has to be chosen for a
>> type, and then a guess has to be made of which one is "best" or "most
>> useful".
>> 
>> The one chosen for binaries was string sort order, because that is how
>> C compares binaries, and strings seemed to be a very important use case
>> for binaries, which Elixir kind of has proven after the fact.
>> 
>> The tuple sort order could have been chosen for binaries, but as I said
>> above that would be bad for the string use case.
>> 
>> Also, regarding "useful"; tuple sort order for binaries can easily be
>> implemented by first comparing sizes then comparing content, whilst
>> should you have tulpe sort order and want to implement list sort order
>> you would have to loop byte by byte.  Therefore the list sort order can
>> be regarded as more useful since it can be a building block for
>> efficient tuple sort order...
>> 
>> Best regards
>> / Raimo Niskanen
>> 
>>> 
>>> Kind regards
>>> 
>>> V/
>>> 
>>> 




More information about the erlang-questions mailing list