[erlang-questions] maps iterator

Wed Sep 30 03:26:55 CEST 2015

This touches on something that sort of bugs me...

I replied inline

On Wednesday 30 September 2015 02:05:11 you wrote:
> 2015-09-30 1:27 GMT+02:00 zxq9 <zxq9@REDACTED>:
> 
> > To iterate through the map you would have to loop over it. How does the
> > map key generator track its place in the key list without maintaining a
> > list, being like ETS, or being in a separate process? Assuming that what
> > you mean isn't already maps:map/2 or maybe a maps:foreach/2?
> >
> 
> Easy, you keep track of it using an iterator context .. similar to a binary
> match context. Don't worry about it.

<snark>Oh, but this sort of discussion quickly becomes an "efficiency" nitpick, so we are compelled to worry about it.</snark>

OK, that's out of my system. I hate efficiency nitpicking without an actual case.

I am actually just curious how "iterator context" is maintained without the code becoming either highly mysterious (as in overuse of the process dictionary) or a different semantic label for an list operation over maps that merely indicates "this version of the function foo/2 is better" (so why have foo/2?).

> Keep in mind that all of those iterations in maps:map/2, maps:fold/3 and
> maps:foreach/2 are implemented using maps:to_list/1. This works but it's
> wasteful. Instead we reimplement those using an iterator. Less waste. 5
> words less waste per pair if we implement it as an instruction otherwise 2
> words. I would prefer the instruction approach. The compiler can always
> reassemble the return value to an 2-tuple if desired.
>
> The iterator in gb_trees is not really fast either. It's faster to iterate
> over gb_trees:to_list/1 in most cases but it's there and it's useful if you
> want to save memory.

That sounds more like a case for "let's improve the implementation underlying the existing iterator semantics in the API" than "we need some totally new thing represented by a new API". That said, if maps, dicts, proplists, gb_trees, and the like all had a common semantics (instead of occasional semantic overlap) making a conscious decision about structural tradeoffs would be a bit more straightforward. (Some have unique traits, and in those places the semantics don't overlap of course.)

> > I'm trying to imagine what the calling code would look like where it isn't
> > like maps:map/2, maps:fold/2 or a hypothetical maps:foreach/2, and maps
> > don't become ETS... and drawing a blank.
> >
> 
> I think most cases can be handled by those functions. That's why they are
> there.

Indeed, had you not told me that they are currently implemented by iterating over maps:to_list/1 I wouldn't have noticed, because huge maps are not a problem with which I must deal very often. I tend to have lots of smallish (hundreds of elements) short-lived maps, and a few very small (tens of elements) long-lived ones -- and lots of code that still just uses dicts, sets, trees or ETS.

But APIs are, at least to me, all about semantics, and we already have a fairly well defined semantics surrounding collection-type data structures, even if each data type's API tends to leave out this or that part of it.

If I hit tab in the terminal after typing `maps:` or `dicts:` in the terminal and foreach/2 appears as a possibility I would fully expect it to work the same way lists:foreach/2 does -- regardless how it is implemented. Introducing something different, like maps:iterate/2 would confuse me and I would run off and read the docs. If, as so often happens, I discover that the definition was something like this:

    maps:iterate(Fun, Map) -> ok

        Works exactly like maps:foreach/2, but implemented more efficiently.

        (maps:foreach/2 is currently implemented as:
            `lists:foreach(F, maps:values(M))`
         and will be deprecated in the next release)

I will be annoyed.

This sort of thing happens all over the stdlib. If a future implementation of an existing function will be superior in every way (faster in execution *and* require less space) nothing is gained by adding a new name for the superior implementation -- other than generating stacks of future consultancy hours worth of work someday.

So... that's two issues:

1- Nearly every collection data type lends itself to some similar form of iteration semantics but the APIs for the different structures, while demonstrating overlap, do not overlap in the same places nor conform to any common rule of "this type of iteration will always be included in a collection API". I think they do all have a Foo:to_list/1 type function, but if traversal is a common need (it is, ordered or not) then doing Foo:to_list/1 in front of a call to lists:TraveralFunction is confusing when often there is a shortcut in *some* APIs, but not others, that does the same thing as Foo:TraversalFunction.

2- The underlying implementations should be made faster instead of introducing new semantics just to indicate a superior implementation.

Is it so crazy to expect that something like [{M, F} || M <- CollectionModules, F <- CommonCollectionOperations] would yield a list of valid {Module, Function} pairs, regardless how they are implemented? They could all be calls to M:F(O, M:to_list(C)) to start with for all I care, but I would expect the semantics not to change and the underlying implementations to improve with time.

Or... is this just totally insane?

-Craig