[erlang-questions] Maps

Thu May 9 05:12:07 CEST 2013

Summary:
   For their intended uses, frames are smaller and faster and safer
   than maps.

   For uses *other* than replacing records, maps are superior to frames.

   It would be good to have BOTH something with clean syntax that
   makes a good replacement for records AND something that does even
   better than the 'dict' module for dictionaries.

   It would be a very bad thing if we got maps *instead* of frames.

Joe Armstrong is writing a new Erlang book in which he *does* use
maps to replace records, but that would be a Bad Thing.

Björn-Egil Dahlberg <egil@REDACTED> is rather unkind about
the Frames proposal when he says
	A record replacement is just that, a replacement.
	It's like asking the question, "What do we have?"
	instead of "What can we get?"
	The instant rebuttal would be "What do we need?" I say Maps.

The central claim, that Frames are *JUST* a replacement for
records, doing nothing novel, is quite untrue.  We can see this
with a point-by-point comparison.

Quoting EEP 43:

    The new data-type shall have semantics, syntax and operations that:

	• provides an association set from key terms to value terms which can be constructed, accessed and updated using language syntax

Frames do that.

	• can be uniquely distinguished from every other data-type in the language

Frames do that.

	• has no compile-time dependency for constructing, accessing or updating contents of maps nor for passing maps between modules, processes or over Erlang distribution

Frames do that.

	• can be used in matching expressions in the language

Frames do that.

	• has a one-to-one association between printing and parsing the data-type

To the extent that I can make sense of that, NEITHER frames NOT maps
do that.   For example, <{ b = 1, a = 2 }> would be expected to
print as <{ a = 2, b = 1 }>.  The only way to understand the requirement
so that maps satisfy it is

    if you print one and read back what you just read
    you get a new term which is indistinguishable from
    the old one.

Frames do that.

Frames do one more thing:

++++  has a canonical print form (up to white space) so that frames
      print the same if and only if they are equal.

When maps were based on hash tables it did not appear that they would
satisfy this requirement, and for Erlang I think it _is_ a requirement.

	• has a well defined order between terms of the type and other Erlang types

Frames do that.

	• has at most O(log N) time complexity in insert and lookup operations, where N is the number of key-value associations.

Frames have at most O(log N) time complexity in lookup operations.

They do not satisfy a "fast insert" require", but that's not because I
didn't _think_ of it, but because I thought of it and deliberately
rejected it.

Frames are BY DELIBERATE CAREFUL CHOICE not a general purpose
dictionary data type.  That's because they offer something that
EEP 43's maps do not and cannot:

++++  costs aN+b worst case space (yes, Maps do that do)
      with b small and a <= 2 (Maps cannot do that) and
      typical cases should see a == 1 (Maps cannot do that).

There is another explicit requirement for frames:

++++  lend themselves to type checking.

If you look at type checked dictionaries in other languages,
they require _all_ values to have the _same_ type.  We would
presumably have something like map(Domain, Range) in Erlang
types.  That makes them unsuitable for record-like uses.

Frames are for record-like uses where you want millions of the things,
holding few enough slots that you could bear to write them all down
by hand, and where if you are already using -type specifications and
the dialyzer, you want to continue to benefit from that.

If Maps are "meant to replace records where suitable",
the answer is that they are hardly ever suitable for that purpose.

I see no problem with a really good general purpose dictionary data
type _as such_.

What I'm terrified is that after waiting all this time for something
good to replace records, we'll get maps ***INSTEAD*** of frames,
rather than ***AS WELL AS*** frames.

We know (thanks to my micro-VM experiments) that frames can be
compact _enough_ and efficient _enough_ to be used instead of
records, and it's also clear that with type checking they can
be safe _enough_ (defined as "safer than records") for this to be
a good thing.

It's clear that a data structure that offers O(log N) update
instead of frames' O(N) --- well actually, that's the cost to
update *one* field, updating F fields is still O(N) so it's
O(N/F) per field --- has to pay for it somehow, and it's going
to be more memory and more time.

"In many records Maps also encompasses Frames but Maps tries to
do more" is misleading.  The Maps proposal does not in any
practical sense "encompass Frames" because it does not achieve
what the Frames proposal set out to achieve.  The Frames
proposal does more of what I *want* than the Maps proposal.

What are the two main user-visible differences?

- Frames are limited to atom keys in the interests of
  compile-time error detection (both simply noticing that
  a key is not an atom at all and allowing the possibility
  of Dialyzer support).

  Maps allow any term as a key.

  Arbitrary keys is something I wanted the way I wanted my hand
  nailed to a desk.  If I want dictionaries with arbitrary keys,
  I want the 'dict' module, and a more efficient implementation
  of that would be just dandy.  But by far the commoner case is
  to need a restricted key space so that my mistakes get caught.

- Frames have O(lg N) access to a single field or O(N/k) access
  to k fields.  They have O(N/k) cost for copying a frame with
  k fields changed.

  Maps have O(lg N) access and O(lg N) update.

  With a bit of data flow analysis, typical field access can be
  reduced to O(1) time with frames.  I cannot recall if I've
  written that up, but it should be pretty obvious.

  The interesting thing here is that for smallish N, even
  fetching a *single* field can be faster with a frame than
  with a Map.  And this is precisely the use-case for frames.