[erlang-questions] Wanted additions to the maps module?

Tue May 10 02:03:59 CEST 2016

On 10/05/16 1:05 AM, Lukas Larsson wrote:
>
> I don't really like dealing in absolutes.
That is an interesting fact about you.
Here's one about me:
I don't like dealing with fundamentally broken libraries.
> I can see scenarios where it would be useful to have that and 
> scenarios where I'd rather get the lookup speed that not having it gives.
False dichotomy, because
  - a data structure can support DUAL access methods
  - a data structure can CHANGE its representation to
   optimise current usage.
Both of these were known in the 1960s.
> One idea could be to have both so that the user could pick and choose, 
> i.e. ofirst/onext and ufirst/unext. However the iterator without the 
> ordering guarantee will always be as fast or faster then the iterator 
> with the ordering guarantee. If I've done my thinking correctly, the 
> complexity of the iterator would be something like O(n/2) for ordered 
> and O(log(n)) for unordered iterators.
I do not understand your reasoning here.
By using a threaded tree, iteration over a key-ordered dictionary
can be done in O(1) worst case time per element.
Again, known since the 1960s, and explained in TAOP vol 1.
If you're not willing to pay the space price of threading,
iteration over an adaptive key-ordered dictionary can be
done in O(1) amortised time per element, and that's
been known since 1985.
>
>     If the value of maps:first_key(Map) and maps:next_key(Map, Key)
>     is not *determined* by the *value* of Map (without reference to
>     the history of how the Map was built or any other inscrutable
>     properties such as where it happens to be stored), then these
>     things just aren't functions.
>
>
> Seems kind of harsh to disqualify them from the entire set of 
> functions, just because they aren't pure and/or the construction of 
> the term doesn't influence it's final state.
I'm perfectly happy to call them procedures, methods, operations, accessors,
or what have you.  But having an output that is determined by the value
of the input is what a function IS.  And the important thing is semantics,
not names.  If the behaviour is affected by things the programmer cannot
even detect, let alone control, then bad things happen.

OK, so Erlang is a little bit pregnant.  There are ETS and DETS, and there
are RPC calls and other things.  But you don't want loss of control over
basic data structures.  You don't want to traverse the same data
structure twice and get different answers even though the value is the
same.  Here's the kind of thing I'm talking about.
Here's some Smalltalk code.

   "make keys a flexible array holding the keys of a dictionary"
     keys := OrderedCollection new.
     aDictionary keysAndValuesDo: [:key :value |
       keys addLast: key].
   "assert that the atom #boojum is not a key in that dictionary"
     [aDictionary includesKey: #boojum] deny.
   "add an association for #boojum and remove it again"
     aDictionary at: #boojum put: #snark.
     aDictionary removeKey: #boojum.
   "the *logical* state of aDictionary is back to what it was."
   "make values a flexible array holding the values of the dictionary"
     values := OrderedCollection new.
     aDictionary keysAndValuesDo: [:key :value |
       values addLast: value].
   "make a new dictionary by mapping keys to corresponding values"
     newDictionary := Dictionary new: aDictionary size.
     keys with: values do: [:key :value |
       newDictionary at: key put: value].
   "check that the new dictionary is equal to the old"
     [newDictionary = aDictionary] assert.

This can FAIL.  It's worse than that.  Suppose we DON'T add and remove
a #boojum -> #snark association.  It's not clear that the second call to
#keysAndValuesDo: has to use the same order as the first.  The only
requirements I can glean from the ANSI Smalltalk standard are that
  - if issued at the same time, any iteration method WOULD have
    used the same order as any other
  - every key is visited exactly once.
An adaptive data structure might well reorganise itself between the two
iterations.

Now in Smalltalk, since it's an Object Oriented Procedural language,
we *expect* that kind of nonsense and take care.  If it's important
that two "unordered" loops use the same order, we know we have to
fuse the loops.

Did I mention that Smalltalk *isn't* crazy enough to have first/next
operations for unordered collections?  Smalltalk isn't crazy enough
to have first/next operations for unordered collections.

Come to think of it, I have been programming in Smalltalk for
nearly 20 years, and I have never *wanted* first/next for unordered
collections.  (And I've thought of *oodles* of operations I did want:
the Dictionary class has 760 public methods, not all of them standard.)

By the way, let's take one little gem.
Suppose we have a dictionary mapping some key or other to floating
point numbers, and we want the sum.  Since floating point addition
is not associative, "aDictionary sum" is not well defined.
> What we are leaning towards right now is to define that two maps that 
> have the same keys also have the same iteration order, no matter how 
> the maps have been created.
And that's precisely what I was arguing for.
> So map "A = #{ a => a }, B = A#{ b => b }." and "C = #{ b => a }, D = 
> A#{ a => b }." would be guaranteed to iterate in the same order. So 
> "{K,V} = maps:first(B) = maps:first(D).", but we don't guarantee which 
> key K was bound to. So I guess in part we agree with you, it is just a 
> matter of deciding how much we have to/should guarantee about the 
> order of the iterators.
The problem is that hashed data structures do not give you this guarantee.

It is not clear to me that a *stable* order is any easier to provide than a
*published* order.

For what it's worth, something analogous to this issue came up in
the ISO Prolog standard.  One of the major tools in Edinburgh Prolog
was a total order on terms, including variables until they were bound.
(This required a garbage collector that preserved genetic order, which
is not hard to ensure if you design one from scratch to do that.)
People who implemented Prolog on top of something else, like Lisp or
Pop-11, wanted to live with the garbage collector they had, so we
ended up with the absurd situation that the standard has a sort/2
predicate that uses an ordering that is stable *for the duration of the
sort only*, so that sorting a list of variables twice can give you two
different answers.  Not Good!  (The Poplog people handled this by
defining a stable order for variables as and when they were compared,
taking extra space and time.)