[erlang-questions] Maps

Wed May 15 03:15:34 CEST 2013

On 15/05/2013, at 3:17 AM, Dan Gudmundsson wrote:
> Richard A. O'Keefe
> > Frames
> >
> Maps are basically an extension of Frames which allows for arbitrary key terms.

And from my perspective that is the major thing WRONG with them.

I have never understood why it is so hard to persuade people
that unbridled power is a *bad* thing.

 "Unbridled Power" is the title of a book by a former New Zealand Prime Minister
 arguing that there were practically no limits on what the New Zealand Government
 could allow itself to do and that this was a bad thing.  When MMP came in he
 wrote a successor called "Bridled Power" explaining why the limitations this
 would place on a government would be a good thing.  However, actual experience
 has shown that the party that gets most of the cabinet seats still thinks that
 getting the votes of 35% of the electors is a licence to do what you please.
 We definitely need better reins for Government.

See the analogy?   Frames are valuable because of the things they CAN'T do.
Maps will take any old rubbish as a key EVEN IF YOU DIDN'T MEAN THAT.

For example, suppose you are dealing with JSON, and you add
X=Y to a map, intending this to mirror adding X: Y to the JSON.
But in fact you needed to add Y=X.  The map doesn't know or care.

If you want to model JSON in Erlang, you want a data structure
that won't _let_ you use "arbitrary key terms".

> Which could use your implementation proposal for small maps defined in source code,  
> and switches to a tree (or something else) for larger dynamic sized Maps.

And frames could also switch to a tree for larger frames.
> 
> >- Frames are limited to atom keys in the interests of
> > compile-time error detection (both simply noticing that
> >  a key is not an atom at all and allowing the possibility
> >  of Dialyzer support).

Frames are limited to atom keys IN THE INTERESTS OF DOING WHAT
THE PROGRAMMER SAYS S/HE WANTS.  Compile-time is always an
attractive option, but run-time will do.  
> 
> So use := and you will get a runtime/compile check that the key exists.

I used to have a Pogo cartoon on my office door where the punchline
was something like "Your mistake is not the right mistake." (:-).

The existence of two different notions of equality in Erlang is a
problem, and some day it will have to be addressed.  Fortunately
Prolog was able to distinguish between X < Y (X is numerically less
than Y) and X @< Y (X is less than Y as a term) and we were able to
say that nan(X) => X = X, X == X, \+ (X =:= Y).  (a NaN unifies with
itself, is identical to itself, but is not numerically equal to
itself.)  One of the reasons for sticking with atoms in frames is
that the two notions of equality and indeed the two notions of
ordering coincide.  You could extend that to allow integers as keys,
which arguably ML does.  You could extend it to allow binaries as
keys, which could be a good thing.  But you can't extend it to floats
without running into trouble.

> Robert Virding
> > Use only matching
> 
> We would like to but we believe we can not do that because we can't output a defined order, 
> i.e. you can not sort 1.0 and 1.
> We would need to define a new term order in erlang and we need to introduce something
> like <:<, >:> =:< and >:=.

No, the issue here is that ORDERED dictionaries and UNORDERED dictionaries
are different types that have similar but not identical interfaces,
constraints, and performance.

In particular, "is identical to" is the right comparison for unordered
dictionaries and "is numerically equal to" is the right comparison for
ordered dictionaries.

Just as I think trying to smush records and dictionaries into a single
data structure is rather like trying to make a combined motorbike/pogo
stick, so I think that trying to smuch ordered dictionaries and
unordered dictionaries into a single data structure is rather like
trying to make a combined motorbike/tracked vehicle.
See http://en.wikipedia.org/wiki/Kettenkrad.   If you want a motorbike,
you probably _don't_ want a Kettenkrad, even though it looks like a
motorbike at the front.

In my Smalltalk library, SortedDictionary offers
    _ first
    _ first: n
    _ firstKey
    _ firstKeyGeq: key [ifNone: aBlock]
    _ firstKeyGtr: key [ifNone: aBlock]
    _ removeFirst
    _ removeFirst: n
    _ last
    _ last: n
    _ lastKey
    _ lastKeyLeq: key [ifNone: aBlock]
    _ lastKeyLss: key [ifNone: aBlock]
    _ removeLast
    _ removeLast: n
    _ removeKeysGeq: key [return: aBoolean]
    _ removeKeysGtr: key [return: aBoolean]
    _ removeKeysLeq: key [return: aBoolean]
    _ removeKeysLss: key [return: aBoolean]
    _ reverse[Keys[AndValues]]Do: aBlock
(not all of these are implemented at the moment), as well
as the operations suitable for unordered dictionaries.
These methods make no sense at all for unordered dictionaries.

Providing them offers greater power.  There are more things you can do.
But the last thing any sane Smalltalk programmer would want is to have
them as the *default* kind of dictionary, because this power comes at
the cost of increased storage and increased time.  I use them
unhesitatingly when I *need* the sorted order.  But I avoid them at all
other times.

The existence of two data structures with *different* contracts
(sorted dictionaries and unsorted dictionaries) makes it hard to
privilege one of them as THE map type to get a syntax.

In developing my Smalltalk, I assure you that I have felt the
cultural pressure from Python and Perl to have a syntax for
dictionaries most keenly.  But I have 12 different kinds of
dictionary (one of them unfinished) and can foresee a need for
two more.  No, wait, there's a "ShellEnvironment" class that is
also a dictionary.  So somewhere between 12 and 15.  Let's admit
that two of them are for compatibility only, so claim just 10
right now.  They differ in
 - what kinds of keys are allowed
 - what kind of comparison is done
 - is sorted traversal easy
 - do they have a "parent" that is consulted for read
   access to keys not directly present
 - what the time and space costs are

Erlang doesn't have a huge number of dictionary types,
but it should certainly have at least

 - keys any term, built in hashing and equality
 - keys any term, programmer-defined hashing and equality
 - keys any term, built-in term order
 - keys any term, programmer-defined term order
 - keys restricted somehow, perhaps a user-defined filter on addition
 - perhaps some optimisation for binary keys or integer keys or atom keys.

Much of the variations could be hidden behind a common interface,
but the distinction between ORDERED and UNORDERED cannot be completely
hidden.