dictionaries (was Re: new syntax - a provocation)

Mon Sep 29 07:26:17 CEST 2003

On Mon, 29 Sep 2003 11:24:50 +1200 (NZST)
"Richard A. O'Keefe" <ok@REDACTED> wrote:

> Chris Pressey <cpressey@REDACTED> continues to insist on
> missing the point.
> 	To summarize:  you asked, "Why not use the dict module?"  and
> 	you gave a bunch of reasons based on the current implementation,
> 	and I pointed out that the representation of dicts is undefined
> 	- so the implementation of the dict module is _irrelevant_ - so
> 	you *can* (and in fact we probably *should* for the sake of
> 	easement) use the dict module (**interface**).
> 
> Imagine a frustrated scream here.

Okay.

> My proposal is just plain *INDEPENDENT* of the dict module.
> It's a historical accident that there IS a dict module.
> My proposal is not intended as a replacement for the dict module.
> 
> THERE IS NOTHING TO BE GAINED BY IMITATING AN OLD BAD INTERFACE.

Forwards-compatibility...?  Zero or near-zero retraining...?  PoLA...? 
Occam's Razor as it applies to software engineering ("do not multiply
interfaces unnecessarily")...?

For that matter, why do you call the dict interface "bad"?

> 	And in the quoted paragraph above you seem to realize that. 
> 	Boffo.  My work here is done.  The rest of your message is
> 	commentary that dodges & hovers around that central point. 
> 	Except for:
> 	
> 	> We don't need a new interface for _that_, but do you really
> 	> _like_ the existing interface?  There were _reasons_ why I
> 	> proposed a different interface.
> 	
> 	It's not a matter of me _liking_ it.  It's a matter of there
> 	being umpteen hundred (thousand?) lines of code already
> 	_written_ for it.
> 	
> What in the name of sanity has that got to do with anything?

See above.

> I have never proposed that the 'dict' module be withdrawn or changed
> in any way.  It is Chris Pressey who suggested changing it.

Yes, I *am* suggesting that!  Changing the *implementation*.  Everything
else - the interface, the documentation - should stay the same.

> I don't *CARE* how much code uses the 'dict' module (except to feel
> sorry for the people who wrote that code) because I AM NOT SUGGESTING
> ANY CHANGE WHATSOEVER IN THE SLIGHTEST to the 'dict' module.

Yes, I know!  That's why I'm *adding* that suggestion to your proposal.

(Except where I don't agree with your proposal, of course -
specifically, I see no reason for dictionary keys to be limited to
atoms.  There also seems to be some minor confusion around whether keys
are ordered or not - I can't see why they should be, and you did not
state that they should be nor propose 'next key'- and 'previous key'-
style operations on them that would suggest that they are, but your
dictionary_to_list operation does for some reason define that the
result is ordered.)

> It is an imporant fact about the dict module that it *has* an
> implementation in Erlang, which means that it uses Erlang data
> structures, which means that it cannot be used to implement a new kind
> of primitive data structure.

No, that is wrong.

The fact that the dict module has an implementation in Erlang is not
important, as you claim, but it is, to borrow a phrase from you, a
"historial accident."  An artefact.  Something that you need not even be
aware of to use the dict module 100% properly.  There is *nothing* about
the dict module that requires that it be implemented as Erlang code.

> If you change it to use a new primitive data structure,
> THAT WILL BE AN INCOMPATIBLE CHANGE.  That's not my suggestion at all.

It will *not* be an incompatible change.  The man page clearly states:

"Dict implements a Key - Value dictionary. The representation of a
dictionary is not defined."

Therefore, the representation may be changed.  Any code that relies on a
particular representation is *already broken*.  This is the very essence
of why anyone would even consider establishing a contract like this in
the first place.

Even if the OTP team is so paranoid as to cater to every irresponsibly
written crap-hack out there which makes assumptions about what it was
explicitly told NOT to make assumptions about - ignoring for the moment
the disingenuousness entailed in giving a condition which one does not
expect others to adhere to - it *still* behooves them to provide an
identical interface for any new dictionary module, to make migrating
existing code easier.  The orddict module can even be seen as setting a
precedent of a sort in this regard.  I'm not certain there's any code
out there that looks like

  Module = case X of
    1 -> dict;
    2 -> orddict
  end,
  D = Module:from_list([{{0, 0}, hello_world}]).

but because the interface is shared, it's certainly not out of the
question, and is, in fact, a reason to keep using the same interface for
*other* dictionary-implementing modules.

Not that there is any reason to make another module!  The bottom line is
that Ericsson can make whatever changes they want to the dict module
- so long as it continues to do what it says it does - and because they
had the foresight to include those magic words "The representation of a
dictionary is not defined" in the man page, no one has *any* right to
complain when it's no longer a tuple like {dict,Size,Etc,Etc,Etc}.

> I have suggested (and it turns out that Joe Armstrong has also
> suggested) a new *primitive* data structure for Erlang.

As have I, in the past.  But it's a moot point here - *if* such a new
primitive data structure is a dictionary there's *still* no reason that
the dict module can't be the interface for it.

Further, to me, adding a new data type to Erlang is very much a seperate
issue from adding better dictionaries to Erlang.  I'll try to explain
why:

My proposal for improving Erlang's notion of data structuring (at the
moment - I've considered several variations but this is the one that has
had the most lasting appeal to me) is that it should be possible to
given any value of any type a 'genus' (essentially just a tagname, much
like Perl's 'bless' mechanism,) by which it may be matched on in guards
and such without examining its structure in any way.

Joe's proposal is similar, but all such named things in his scheme must
be structs.

Your proposal, as I understand it, is that dictionaries are opaque
insofar as they are a seperate primitive, like lists - but (also like
lists) they do not have specific names, so while it is possible to
distinguish a dictionary from any other data type, there is no way to
distinguish between different 'kinds' of dictionaries aside from
matching on a key or value that you know, by convention, should be
present.  Much like tagged tuples today.

Is this a fair assessment of our respective proposals?

I prefer my proposal because it doesn't tie the programmer to one
particular representation.  If they want to pass, say, 'employee' data
to code that doesn't need to (and in fact shouldn't) be aware of the
internals of it, they can implement each 'employee' as a record, a
dictionary, a property list, a string, a pid, or a reference, or
whatever else, as they see fit - and they can decide to change it later
without breaking that other code.  That other code can still test to see
if a value is an 'employee' or not with a guard test something like

  genus(A) == employee

...without having to know anything about the structure of A.

With Joe's proposal or yours, an 'employee' would have to be represented
by (or at least wrapped in) a struct or a dictionary, respectively.  In
Joe's case, other code could tell that A is an employee with a guard
test something like

  is_struct(A, employee)

In your case, other code would have to determine that A is an employee
by something like

  dictionary_has(A, employee_id)

Either of these two ways, you end up knowing that A is a struct or a
dictionary.  Which is, frankly, more than you need to know.

That said, I see nothing wrong with improving Erlang's ability to
structure data in dictionaries, either, and I assumed (reasonably, I
thought) that that, rather than typing, was the main thrust of your
proposal.

> Putzing around with the'dict' module would IN NO WAY simplify the
> implementation of that new primitive data structure,

No, but using it as the implementation of the dict module would simplify
the *adoption* of that data structure by Erlang programmers.  Because
chances are they're already familiar with the dict interface.

If it ain't broke, don't fix it.  And the dict interface ain't broke.

> while there are ways in which it could break existing working code.

Existing *incorrect* code that violates contract.  The fact that it may
work is, again as you put it, an accident.

> Y'know, I was careful *not* to use the name 'dict' for these things.
> I did have good reasons.  "Dictionaries" are not "dicts" are not
> "dictionaries".  They are obviously similar, but they are DIFFERENT,
> and independent.  There is no reason whatsoever why dictionaries
> should have the same interface as dicts, and many reasons why they
> should not.

I've given four reasons why dictionaries should have the same interface
as dicts in my second paragraph in this message.  The only reasonable
reason I can see for why they should not is if they are limited to
having only atoms as their keys - and I do not see much value in such
crippled dictionaries.

-Chris