[erlang-questions] Maps

Tue May 14 18:22:46 CEST 2013

Comments below:

On 05/14/2013 06:07 AM, Robert Virding wrote:
> One thing that has always puzzled me in these discussions is this fascination with *NATIVE* implementations. Why *native*? What is it with *native* that makes people want it? Is it the speed? Or is it the special syntax? Or what? Why care HOW things are done just as long as it fulfils my requirements. When I want a fast dictionary then FAST is the requirement, same with ORDERED or BAGS or ... . How it is done I don't care. If a special syntax is helpful then great, it is the syntax I am after. People complain that you need a special data type or else you can get them mixed up. I don't buy that. I work with lists, tuples, proplists, orddicts, dicts, gb_trees, records, etc together in one application and one problem I don't have is getting them mixed up. I see it as a solution to a problem which is at worst only very small.

I think my concern about whether the implementation is native (Erlang-only source code) or not relates to the performance characteristics.  It seems as if we need C integration to get faster than dict on a wide variety of key types.  I agree that it is best to have many different types of dictionary implementations, and ideally we would have a bunch of dictionaries implemented in C also :-)  However, I understand that more testing and considerations are necessary to make the C integration decent performance-wise and fault-tolerance-wise.  Basically, what seems required, is performance similar to the process dictionary, or better, as a separate type, called Maps.
>
> Michael, please don't the idea that I am coming down on you specifically here. I am not. You just had the misfortune to mention this near the top where I could easily see it. And it is something I have been pondering about for a long time. It is one of the classic complaints about Erlang (after ;, if, variables, ...), why isn't there a built-in map type.
>
> One of the ideas behind dicts/orddicts was to provide a standardised API which would allow you to choose and easily change which implementation you need/want. Unfortunately we never wrote this down anywhere and only two different versions exist (though I have a ttdict based on 2-3 trees if anyone needs a sorted tree, very fast too). While this forced you to think it would give you lots of options. One of the bad things of having a built-in type is that you don't have think, but you won't always get the most suitable. And seriously thinking about your data structures is always a Good Thing.
I agree, and I wish OTP/Erlang was able to add more data structures like your rbtree to compliment the gbtree, and possibly resolving any differences between the aatree and the gbtree.  So, I think having more data structures is more beneficial than having less, since it gives developers more options.  Though I understand these decisions are probably conservative and slow moving.  Please release the ttdict!  Have been wanting to see that one for awhile.
>
> Sorry for the rant here.
>
> What *I* would like is:
>
> - A dynamic record-like alternative which I call rr (Robert's records) to avoid getting into a discussion about what frames are/aren't.
>
> - A really fast dict implementation. I don't care how it's done but speed is the main requirement. If it represented as a built-in type or a tree of tuples/lists I don't care, just as long as it is fast.
I agree a record-like alternative would be nice.  I just see it as less important than the fast dictionary implementation.  So, if given the choice between the two, I am happy we can get the fast dictionary implementation.

The record-like alternative is supposedly most important for external term representation, since it is a separate type.  Normally, usage of records as external terms are a type of message format used in communication with ports or other things (cnodes, or possibly port drivers).  So, perhaps what we want is something a bit different from the concept of a "record" but rather a "message" type.  A message could just be a generic container that defines a name, a header, and a body, where the header is normally empty (the header is always key->value) and the name is always an atom (or possibly atom or binary/string, to avoid excessive atom consumption).  The body could be a binary that is accessed efficiently based on key->values where each key maps to an index.  I will admit this idea is coming from a different angle, so it isn't well-aligned with the previous discussions, but it seems like trying to replace records as they are typically used is futile (unless we just make a type
specific to tuples with an atom to keep the same performance, and satisfy both concerns).  Either way, it seems important to focus on what the record-like alternative is suppose to solve and the problems it is trying to solve.  In the past, the discussion of frames seems to have gotten bogged down in syntax discussions, and convenience, but whatever record-like alternative which is pursued seems to require clearly stated benefits.  So far, the main one I have seen is just that there is a way to represent records as external terms, and that reason isn't entirely compelling when you first see it.  Simplifying debugging and development is probably an obvious goal, but to really do that, requires the same performance to replace records usage, so then you get back to replicating a tuple atom combo as a separate type, instead of focusing on a new type with new characteristics.
>
> Loïc you and I can meet at dawn like gentlemen to discuss special syntax or not. Swords or pistols? :-)
>
> Robert
>
>
>
> ----- Original Message -----
>> From: "Michael Truog" <mjtruog@REDACTED>
>>
>> I am a bit disturbed by the concern over Frames.  Since I started
>> using Erlang I have always wanted native, efficient dictionary
>> functionality as an actual type within the language (not just a dict
>> module, despite the fact dict has one of the best possible
>> implementations within native Erlang source code).  It is a major
>> disappointment coming to the Erlang language, being given the
>> process dictionary, and then being told that you should never use
>> it, since it is dirty mutable state that ruins our pure referential
>> transparency..  So, the native dictionary type, Maps, seems long
>> (long) overdue when Erlang is compared with newer languages like
>> Python, Ruby., Perl and Java.
>>
>> Having a newer and better type of records named Frames seems like an
>> opportunity to take something decent and make it slower (though I
>> understand the extreme effort and attention to make performance
>> close).  The argument is hopefully not: "But wait, we have neat
>> syntax which allows us to make neat examples, and they might be
>> useful.... and now everyone should use Frames in their code instead
>> of records", since that seems least desirable.  I think it should be
>> hard to make something more efficient than a simple array index
>> lookup, which seems to be the basic idea behind a tuple (with the
>> record syntactic sugar).  I don't necessarily care about the records
>> syntax being this preprocessor trick since that is made clear within
>> the documentation.  I think the idea that Frames provide a record
>> alternative relies on the benefits of Frames, which appear too
>> minimal because they seem to be focusing on improving the type
>> system, more than extending functionality (so the concern looks
>> more academic than pragmatic).  One of the basic benefits of Frames
>> that has been mentioned is that the type can be used explicitly for
>> external data, however an external API could easily wrap the
>> creation of an atom and tuple, so that justification of Frames
>> doesn't feel very compelling (though the Frame type would make a
>> record/tuple object more naturally a C struct-like type within
>> Erlang).
>>
>> I understand often performance is not a concern, but I can't help but
>> think it should always be the most important concern for compiler
>> writers and VM writers (at the very least, not to mention the
>> database writers, the message bus writers, and the algorithm library
>> writers), which is why I believe it carries so much importance to
>> the decision of the implementation of Maps or Frames.  So, I am very
>> happy with the pursuit of the Map type, while the pursuit of the
>> Frame type seems inappropriate (in the sense of a feature triage).
>>