Author: Björn-Egil Dahlberg Status: Draft Type: Standards Track Created: 04-Apr-2013 Erlang-Version: R17A Post-History: EEP XXX: Maps --- Abstract ======== The journey of Maps and this EEP has been long and by no means a straight-forward and continuous one. I had a crystal clear picture of what I wanted Maps to be when we first started discussing it within OTP about two-three years ago. This EEP resembles that vision but it has had a lot of contributions of other ideas from both within and outside of OTP. The idea was a data-type, a syntax aware mapping of key-value associations with pattern matching. A syntax similar to records but without the hazzle of compile-time dependency and with arbitrary terms as keys. Order was not important and it could be implemented with a Hash-Array-Mapped-Trie with good performance and memory trade-offs. This was a different approach than to replace records. It was meant to replace records where suitable and in other regards not be a replacement but its own *thing*. >From the community there has been many wishes of a Map like data-type and a few suggestions. The one suggestion that stands out is of course the Frames proposal from Richard O'Keefe. It is the most complete proposal I've seen and is very well thought out. Its goal is to be a record replacement and the proposal satisfies this goal very well. - If Frames are that good, why a separate EEP? - It boils down to goals and constraints. A record replacement is just that, a replacement. It's like asking the question, "What do we have?" instead of "What can we get?" The instant rebuttal would be "What do we need?" I say Maps. Frames has certainly inspired and influenced Maps. In many regards Maps also encompasses Frames but Maps tries to do more. In the end I believe they are two different things and have different goals. This EEP suggests a new built in data-type for Erlang, the map, `#{ Key => Value }`. The new data-type shall have semantics, syntax and operations that: * provides an association set from key terms to value terms which can be constructed, accessed and updated using language syntax * can be uniquely distinguished from every other data-type in the language * has no compile-time dependency for constructing, accessing or updating contents of maps nor for passing maps between modules, processes or over Erlang distribution * can be used in matching expressions in the language * has a one-to-one association between printing and parsing the data-type * has a well defined order between terms of the type and other Erlang types * has at most O(log N) time complexity in insert and lookup operations, where N is the number of key-value associations. Similar data-types exists in other languages, i.e. [perl hashes][perl-hashes], [ruby hashes][ruby-hashes], [python dictionaries][python-dict], or [scala maps][scala-maps]. Specification ============= A map `M` consists of a number of *associations* and keeps an association from key terms `K1..Kn` to value terms `V1..Vn` where no two keys are *equal*. Any term, compounded or otherwise, is a viable key or value. Terms of type Map are recognized by guard tests `erlang:is_map/1`. There are no operators acting on maps. Within maps there are two infix operators. An association operator, `=>`, pairs a key to a value and is used in creation and updates. A set-value operator, `:=`, is used to update a value on an already existing and matching key. The set-value operator is also used in matching to get the associated value from a key. ## Terminology ## The *size* of a map is the number of associations in its set. An *association* is a key-value pair of key *K* to value *V* in a Map. Two keys, `K1` and `K2` are *equal* if, `true = K1 == K2`. Two keys, `K1` and `K2` are *matching* if, `true = K1 =:= K2`. Syntax ------ Defined syntax for declaring, updating and matching maps. ### Construction syntax ### Constructing a new map is done by letting an expression `K` be associated to another expression `V`: #{ K => V } New maps may include multiple associations at construction by listing every association: #{ K1 => V1, .. Kn => Vn } An empty map is constructed by not associating any terms with each other: #{} All keys and values in the map are terms. Any expression is first evaluated and then the resulting terms are used as *key* and *value* respectively. Keys and values are separated by the `=>` arrow and associations are separated by `,`. Examples: M0 = #{}, % empty map M1 = #{ a => <<"hello">> }, % single association with literals M2 = #{ 1 => 2, b => b }, % multiple associations with literals M3 = #{ A => B }, % single association with variables M4 = #{ {A, B} => f() }. % compound key associated to an evaluated expression where, `A` and `B` are any expressions and `M0` through `M4` are the resulting map terms. If two equal keys are declared, the latter key will take precedent. Example: 1> #{ 1 => a, 1 => b }. #{ 1 => b } 2> #{ 1.0 => a, 1 => b }. #{ 1 => b } The order in which the expressions constructing the keys and their associated values are evaluated is not defined. The syntactic order of the key-value pairs in the construction is of no relevance, except in the above mentioned case of two keys comparing equal. A simple BNF grammar for the construction follows: ::= '#{' '}' ::= /* empty */ |

::= | ','

::= '=>' ::= ### Update syntax ### Updating a map has similar syntax as constructing it. An expression defining the map to be updated is put in front of the expression defining the keys to be updated and their respective values. M#{ K => V } where `M` is an term of type map and `K` and `V` are any expression. If key `K` does not *equal* any existing key in the map, a new association will be created from key `K` to value `V`. If key `K` is *equal* to an existing key in map `M` its associated value will be replaced by the new value `V` *and* the new key `K` will replace the old key if the key does not *match*. In both cases the evaluated map expression will return a new map. If `M` is not of type map an exception of type `badmap` is thrown. To only update an existing value, the following syntax is used, M#{ K := V } where `M` is an term of type map, `V` is an expression and `K` is an expression which evaluates to an existing key in `M`. If key `K` does not *match* any existing keys in map `M` an exception of type `badarg` will be triggered at runtime. If a *matching* key `K` is present in map `M` its associated value will be replaced by the new value `V` and the evaluated map expression returns a new map. If `M` is not of type map an exception of type `badmap` is thrown. Examples: M0 = #{}, M1 = M0#{ a => 0 }, M2 = M1#{ a => 1, b => 2 }, M3 = M2#{ "function" => fun() -> f() end }, M4 = M3#{ a := 2, b := 3 }. % 'a' and 'b' was added in `M1` and `M2`. where `M0` is any map. It follows that `M1 .. M4` are maps as well. More Examples: 1> M = #{ 1 => a }, #{ 1 => a } 2> M#{ 1.0 => b }. #{ 1.0 => b }. 3> M#{ 1 := b }. #{ 1 => b } 4> M#{ 1.0 := b }. ** exception error: bad argument As in construction, the order in which the key and value expressions are evaluated are not defined. The syntactic order of the key-value pairs in the update is of no relevance, except in the case where two keys compare equal, in which case the latter value is used. A simple BNF grammar for map updates follows: ::= '#{' '}' ::= /* empty */ |

::= | ','

::= |

::= '=>'

::= ':='

::= ::= ### Accessing a single value ### For accessing single values in maps, let us use an de-association: V = M#{ K }. Where `M` is a Map and `K` is any term. If key `K` is *equal* to an existing key in map `M` the associated value will be bound to `V`. If key `K` does not *equal* to any existing key in map `M` an exception `badarg` will occur in runtime. Examples: M1 = #{ a => 1, c => 3 }, 3 = M1#{ c }. M2 = #{ 1.0 => a }, a = M2#{ 1 }. ### Matching syntax ### Matching of key-value associations from maps, exemplified with the matching operator, is done in the following way: #{ K := V } = M where `M` is any map. The key `K` has to be an expression with bound variables or a literals, and `V` can be any pattern with either bound or unbound variables. If variables in `V` are unbound, it will be bound to the value associated with the key `K`, which has to exist in the map `M`. If variables in `V` are bound, it has to match the value associated with `K` in `M`. Example: M = #{ a => {1,2}}, #{ a := {1,B}} = M. This will bind variable `B` to integer `2`. Similarly, multiple values from the map may be matched: #{ K1 := V1, .., Kn := Vn } = M where keys `K1 .. Kn` are any expressions with literals or bound variables. If all keys exists in map `M` all variables in `V1 .. Vn` will be matched to the associated values of there respective keys. If the matching conditions are not met, the match will fail, either with 1. a `badmatch` exception, if used in the context of the matching operator as in the example, 2. or resulting in the next clause being tested in function heads and case expressions. Matching in maps only allows for `:=` as delimiters of associations. The order in which keys are declared in matching has no relevance. Duplicate keys are allowed in matching and will match each pattern associated to the keys. #{ K := V1, K := V2 } = M Matching an expression against an empty map literal will match its type but no variables will be bound: #{} = Expr This expression will match if the expression `Expr` is of type map, otherwise it will fail with an exception `badmatch`. The grammar for the matching syntax is similar to that of construction. #### Matching syntax: Example with literals in function heads #### Matching of literals as keys are allowed in function heads. %% only start if not_started handle_call(start, From, #{ state := not_started } = S) -> ... {reply, ok, S#{ state := start }}; %% only change if started handle_call(change, From, #{ state := start } = S) -> ... {reply, ok, S#{ state := changed }}; #### Matching syntax: Frequency example #### More matching syntax, calculating frequency of terms in a list. freq(Is) -> freq(Is, #{}). freq([I|Is], #{I := C} = M) -> freq(Is, M#{ I := C + 1}); freq([I|Is], M) -> freq(Is, M#{ I => 1 }); freq([], M) -> maps:to_list(M). Equivalent code with `gb_trees` for comparison: freq(Is) -> freq(Is, gb_trees:empty()). freq([I|Is], T) -> case gb_trees:lookup(I, T) of none -> freq(Is, gb_trees:enter(I, 1), T); {value, V} -> freq(Is, gb_trees:enter(I, V + 1, T)) end; freq([], T) -> gb_trees:to_list(T). #### Matching syntax: File information example #### Old API's could be refined to use map syntax: 1> {ok, #{ type := Type, mtime := Mtime }} = file:read_file_info(File). 2> io:format("type: ~p, mtime: ~p~n", [Type, Mtime]). type: regular, mtime: {{2012,7,18},{19,59,18}} ok 3> ### Map comprehension syntax ### Map comprehension declaration: M1 = #{ E0 => E1 || K := V <- M0 } where `M0` is any Map, `E0` and `E1` are any erlang expression, `K` and `V` constitutes the pattern to be matched by each association in `M0`. For each sequence in the generator an association is created from the evaluated expression `E0` to the evaluated expression `E1`. If `M0` is not a Map, then a runtime exception of type `{bad_generator, M0}` will be generated. A simple BNF grammar for map comprehension follows: ::= '#{' '||' '}' ::= | ','

::= | ::= '<-' ::=

::= '=>'

::= ':=' ::= Each association from all generators, which satisfies the filters, has an environment that consist of the initial environment and the environment for the association. Examples: M0 = #{ K => V*2 || K := V <- map() }, M1 = #{ I => f(I) || I <- list() }, M2 = #{ K => V || <> <= binary() }. Map generators may also be used in binary and list comprehensions. Examples: B1 = << <> || _ := V <- map() >>, L1 = [ {K,V} || K := V <- map() ]. ### Dialyzer and Type specification ### Keys known before hand can be specified directly and uniquely for a map. -spec func(Opt, M) -> #{ 'status' => S, 'c' => integer() } when Opt :: 'inc' | 'dec', M :: #{ 'status' => S, 'c' => integer() }, S :: 'update' | 'keep'. func(inc, #{ status := update, c := C} = M) -> M#{ c := C + 1}; func(dec, #{ status := update, c := C} = M) -> M#{ c := C - 1}; func(_, #{ status := keep } = M) -> M. It could also be specified by type only. -spec plist_to_map(Ls) -> #{ binary() => integer() } when Ls :: [{binary(), integer()}]. plist_to_map([], M) -> M; plist_to_map([{K,V}|Ls], M) when is_binary(K), is_integer(V) -> plist_to_map(Ls, M#{ K => V }); plist_to_map([_|Ls], M) -> plist_to_map(Ls, M). It can similarly be specified as a type. -type map1() :: #{ binary() => integer() }. -type map2() :: #{ <<"list1">> | <<"list2">> => [numbers()] }. Functions and Semantics ----------------------- *The module implementing the functions is currently named in plural, `maps` in the same spirit as `lists`, `gb_trees`, `sets` etc but the singular `map` is shorter and may be more desirable.* Functions and semantics for maps. Originally much inspiration was from Richard O'Keefes frames proposal. ### Erlang module extension ### ##### `erlang:is_map(M :: term()) -> boolean().` This function is a guard function. Syntax equivalence: `try #{} = M, true catch error:{badmatch,_} -> false end`. The function returns `true` if M is a map otherwise `false`. ##### `erlang:map_size(M :: map()) -> non_neg_integer().` This function is a guard function. Syntax equivalence: *none*. The function returns the number of key-value pairs in the map. Same as, `maps:size(M)` and `length(maps:to_list(M))`. ### maps module ### ##### `maps:remove(K0 :: term(), M0 :: map()) -> M1 :: map().` Syntax equivalence: `#{ K => V || K := V <- M0, K =/= K0 }`. *Only with comprehensions* The function removes the key `K0`, if it exists, and its associated value from map `M0` and returns a new map `M1` without key `K0`. Same as, `maps:from_list([{K,V}||{K,V} <- maps:to_list(M0), K =/= K0])` ##### `maps:get(K :: term(), M :: map()) -> V :: term().` Syntax equivalence: `M#{ K }`. Returns the value `V` associated with key `K` if map `M` contains a key that *equals* `K`. If no value is associated with key `K` then the call will fail with an exception. Same as,`begin {value,{_,V}} = lists:keysearch(K, 1, maps:to_list(M)), V end`, except for errors. ##### `maps:keys(M :: map()) -> [K1, .., Kn].` Syntax equivalence: `[K || K := _ <- M]`. Returns a complete list of Keys, in sorted order, which resides within map `M`. Same as, `[K || {K,_} <- maps:to_list(M)]`. ##### `maps:find(K :: term(), M :: map()) -> {ok, V :: term()} | error.` Syntax equivalence: `try #{ K := V } = M, V catch error:{badmatch,_} -> error end`. Returns a tuple `{ok, V}` with value `V` associated with key `K` if map `M` contains key `K`. If no value is associated with key `K` then the function will return `error`. Same as, `case lists:keysearch(K, 1, maps:to_list(M)) of {value,{_,V}} -> {ok,V}; false -> error end`. ##### `maps:foldl(F :: fun((K :: term(), V :: term(), In :: term()) -> Out :: term()), A :: term(), M :: map()) -> Result :: term().` Syntax equivalence: *none* Calls `F(K, V, In)` for every key `K` to value `V` association in map `M` in ascending key order. The function fun `F/3` must return a new accumulator which is passed to the next successive call. `maps:foldl/3` returns the final value of the accumulator. The initial accumulator value `A` is returned if the map is empty. Same as, `lists:foldl(fun({K,V}, In) -> F(K,V,In) end, A, maps:to_list(M))`. ##### `maps:foldr(F :: fun((K :: term(), V :: term(), In :: term()) -> Out :: term()), A :: term(), M :: map()) -> Result :: term().` Syntax equivalence: *none* Calls `F(K, V, In)` for every key `K` to value `V` association in map `M` in descending key order. The function fun `F/3` must return a new accumulator which is passed to the next successive call. `maps:foldr/3` returns the final value of the accumulator. The initial accumulator value `A` is returned if the map is empty. Same as, `lists:foldr(fun({K,V}, In) -> F(K,V,In) end, A, maps:to_list(M))`. ##### `maps:from_list([{K1,V1}, .., {Kn,Vn}]) -> M :: map().` Syntax equivalence: `#{ K1 => V1, .., Kn => Vn }` The function takes a list of key-value tuples elements and builds a map. The associations may be in any order and both keys and values in the association may be of any term. If the same key appears more than once, the latter (rightmost) value is used and the previous values are ignored. ##### `maps:is_key(K :: term(), M :: map()) -> bool().` Syntax equivalence: `try #{ K := _ } = M, true catch error:{badmatch, _} -> false end`. Returns `true` if map `M` contains key `K`, otherwise it returns `false`. Same as, `lists:keymember(K, 1, maps:to_list(M))`. ##### `maps:map(F :: function(), M0 :: map()) -> M1 :: map().` Syntax equivalence: `#{ K => F(K,V) || K := V <- M }`. *Only with comprehensions* The function produces a new map `M1` by calling the function fun `F(K, V)` for every key `K` to value `V` association in map `M0` in arbitrary order. The function fun `F/2` must return the value to be associated with key `K` for the new map `M1`. Same as, `maps:from_list(lists:map(fun({K,V}) -> {K, F(K,V)} end, maps:to_list(M)))`. ##### `maps:new() -> M :: map().` Syntax equivalence: `#{}`. Returns a new empty map. Same as, `maps:from_list([])`. ##### `maps:size(M :: map()) -> Size :: non_neg_integer().` Syntax equivalence: *none*. The function returns the number of key-value associations in the map. Same as, `erlang:map_size(M)` and `length(maps:to_list(M))`. ##### `maps:put(K :: term(), V :: term(), M0 :: map()) -> M1 :: map().` Syntax equivalence: `M0#{ K => V }`. Associates key `K` with value `V` and inserts the association into map `M0`. If a key exists that *equals* `K`, the old associated value is replaced by value `V`. If key `K` does not *match* but *equals* the old key is also replaced by key `K`. The function returns a new map `M1` containing the new association. Same as, `maps:from_list(maps:to_list(M0) ++ [{K,V}])`. ##### `maps:to_list(M :: map()) -> [{K1,V1}, ..., {Kn,Vn}].` Syntax equivalence: `[{K, V} || K := V <- M]`. Where the pairs, `[{K1,V1}, ..., {Kn,Vn}]`, are returned in a sorted order. The Map ordering is described later in this document. ##### `maps:update(K :: term(), V :: term, M0 :: map()) -> M1 :: map()` Syntax equivalence: `M0#{ K := V }`. If a key exists that *matches* `K`, the old associated value is replaced by value `V`. The function returns a new map `M1` containing the new associated value. Same as, `maps:from_list(maps:to_list(M0) ++ [{K,V}])`. ##### `maps:values(M :: map()) -> [V1, .., Vn].` Syntax equivalence: `[V || _ := V <- M]`. Returns a complete list of values, in key sorted order, contained in map `M`. Same as, `[V || {_,V} <- maps:to_list(M)]`. ##### `maps:without([K1, .., Kn] = Ks, M0 :: map()) -> M1 :: map().` Syntax equivalence: `#{ K => V || K := V <- M0, not lists:member(K, Ks) }`. *Only with comprehensions* Removes keys `K1` through `Kn`, and their associated values, from map `M0` and returns a new map `M1`. Same as, `maps:from_list([{K,V}||{K,V} <- maps:to_list(M0), not lists:member(K, Ks)])`. ##### `maps:merge(M0 :: map(), M1 :: map()) -> M2 :: map().` Syntax equivalence: *none* Merges two maps into a single map. If two *equal* keys exists in both maps the value in map `M0` will be superseded by the value in map `M1`. Same as, `maps:from_list(lists:keymerge(1,maps:to_list(M0),maps:to_list(M1)))`. Equality and Ordering --------------------- ### Equality ### In the case of term `A` and term `B` both are maps, * If `A` and `B` have different sizes, then `A` and `B` are not equal. * Otherwise, if all corresponding keys of `A` and `B` are pair- wise equal with their corresponding values, then `A` and `B` are equal. * Otherwise, `A` and `B` are not equal. It follows that two maps are equal if, and only if, they are of the same, *type*, *size* and all corresponding key-value associations are pairwise equal. ### Ordering ### The term order is defined in [Erlang specification 4.7][erlspec] and quoted below: > * The terms are primarily ordered according to their type, in the following order: > > numbers < atoms < refs < ports < PIDs < tuples < empty list < conses < binary The specification is incomplete here, the actual term order is: numbers < atoms < refs < funs < ports < pids < tuples < empty list < conses < binaries The Maps data-type are ordered next after tuples: numbers < atoms < refs < funs < ports < pids < tuples < maps < empty list < conses < binaries ---- Maps are then ordered first by their size and then according to their respective keys and lastly by the associated values in Erlang term order. Given two maps, `M1` and `M2`, with the same size, they are compared so that each key, in Erlang term order of the keys, in `M1` is compared to the corresponding key of `M2`. All keys are compared first, then the values, until a difference is found. If a key or value differs, the order of the respective terms, in Erlang term order, is the order of the maps. If no key-value pairs differ, the maps are considered equal. Example: > #{ b => 2 } > #{ a => 2 }. % b > a true > #{ b => 2 } > #{ a => 1, b => 2 }. % size 1 < size 2 false > #{ b => 1 } > #{ b => 2}. % 1 < 2 false > #{ b => 2, c => 3 } > #{ b => 1, d => 3}. % c > d, compared before 2 and 1 false > #{ b => 1 } > #{ 1 => 1}. % b > 1 true > #{ 1.0 => a } == #{ 1 => a }. % 1.0 == 1 true > #{ 1.0 => a } =:= #{ 1 => a }. % 1.0 =:= 1 false Maps are printed with keys in Erlang term order. Example: > #{ c => 3, b => 2, a => 1 }. #{ a => 1, b => 2, c => 3 } Operator Precedence ------------------- Map association operator and set-value operator is ordered last, after match-operator and `catch`. : # Unary + - bnot not / * div rem band and Left associative + - bor bxor bsl bsr or xor Left associative ++ -- Right associative == /= =< < >= > =:= =/= andalso orelse = ! Right associative catch => := It follows that the map expression: #{ key => C = 1 + 2 } will evaluate in the following order: #{ key => ( C = ( 1 + 2 ) ) } Pattern matching ---------------- Pattern matching is very powerful Erlang tool. Maps introduces a couple of new features with its pattern matching. ### Pattern matching: Basics ### We will exemplify using the match operator. Pattern matching with maps is similar to records on the surface. Keys requested in a LHS pattern will be bound with the values which is found in the RHS association map. 1> #{ a := V } = #{ a => 1 }. #{ a => 1 } 2> 1 = V. 1 Keys requested in a LHS pattern which is not found in the RHS map will produce an exception, `exception error: no match of right hand side value ...`. 1> #{ not_in_map := V } = #{ a => 1 }. ** exception error: no match of right hand side value #{ a => 1 } Similarly, if a value for a requested key in the LHS pattern is not equal to the keys associated value in the RHS map the match will produce an exception. 1> #{ a := 10 } = #{ a => 1 }. ** exception error: no match of right hand side value #{ a => 1 } Only the keys requested will bind to associations. Any unrequested keys which resides within the map being matched will be ignored. 1> #{ a := V1, b := V2 } = #{ a => 1, b => 2, c => 3}. #{ a => 1, b => 2, c => 3 } 2> 1 = V1. 1 3> 2 = V2. 2 The order of keys requested has no significance when pattern matching a map. 1> #{ a := "1", b := "2" } = #{ a => "1", b => "2" }. #{ a => "1", b => "2" } 2> #{ b := "2", a := "1" } = #{ a => "1", b => "2" }. #{ a => "1", b => "2" } ### Pattern matching: Continued ### The example below is a constructed example to illustrate the power of map pattern matching. A match expression is evaluated so that variables used as keys in the expression are bound before they are evaluated (if possible). As an example, keys can be bound by other key-value associations. 1> #{ K := V, id := K } = M = #{ id => b, a => 1, b => 2, c => 3}. #{ id => b, a => 1, b => 2, c => 3} 2> b = K. b 3> 2 = V. 2 In this case, the bound key `id` is evaluated first and looked up in M, binding the variable `K`. The `K` bound to `b` can then be used to bind `V` to 2. Binding variables used as keys requires that there is a possible order of binding without cycles. The reordering extends to all terms in a matching expression, so that: 1> { #{ X := Y }, X } = { #{ 5 => 10 }, 5 }. with `X` and `Y` unbound, results in a successful match binding `X` to 5 and `Y` to 10. This is particular useful when updating specifics in map associations: %% Function declared in module map_example update_values([{K, V1}|Ls], #{ K := V0 } = M) -> update_values(Ls, M#{ K := V0 + V1 }); update_values([_|Ls], M) -> update_values(Ls, M); update_values([], M) -> M. The first function clause is important here. Key `K` is bound in the tuple and will be used to request value `V0` from map `M`. The map `M` is then updated to associate key `K` with the new value `V0 + V1`. %% In the Erlang shell 1> M = #{ "a" => 1, "b" => 2, "c" => 3 }. #{ "a" => 1, "b" => 2, "c" => 3 } 2> map_example:update_values([{"b", 10}, {"c", 20}, {"d", 30 }], M). #{ "a" => 1, "b" => 12, "c" => 23 } Note that since key `"d"` does not reside in map `M` it fails to match the first clause and does not update the map with the association `"d" => 40`. An expression where the dependencies in the LHS of the match are cyclic, like: 1> #{X := Y, Y := X} = #{5 => 10, 10 => 5}. will result in an evaluator error (variable is unbound) or a compilation error. External Term Format -------------------- There are 255 tags that can be used to encode terms for external binary distribution. All tags are defined in `external.h`. The encoding starts by a magic byte `131`. The encoding tags used in R15B01 are the following: SMALL_INTEGER_EXT 'a' 97 INTEGER_EXT 'b' 98 FLOAT_EXT 'c' 99 ATOM_EXT 'd' 100 SMALL_ATOM_EXT 's' 115 REFERENCE_EXT 'e' 101 NEW_REFERENCE_EXT 'r' 114 PORT_EXT 'f' 102 NEW_FLOAT_EXT 'F' 70 PID_EXT 'g' 103 SMALL_TUPLE_EXT 'h' 104 LARGE_TUPLE_EXT 'i' 105 NIL_EXT 'j' 106 STRING_EXT 'k' 107 LIST_EXT 'l' 108 BINARY_EXT 'm' 109 BIT_BINARY_EXT 'M' 77 SMALL_BIG_EXT 'n' 110 LARGE_BIG_EXT 'o' 111 NEW_FUN_EXT 'p' 112 EXPORT_EXT 'q' 113 FUN_EXT 'u' 117 DIST_HEADER 'D' 68 ATOM_CACHE_REF 'R' 82 ATOM_INTERNAL_REF2 'I' 73 ATOM_INTERNAL_REF3 'K' 75 BINARY_INTERNAL_REF 'J' 74 BIT_BINARY_INTERNAL_REF 'L' 76 COMPRESSED 'P' 80 For Maps we define tag `MAP_EXT` to 116 (`t`). Data layout: MAP_EXT |----------------------------------- | 1 | 4 | | | |----------------------------------- | 116 | Size | Keys | Values | |----------------------------------- The `Size` specifies the number of keys and values that follows the size descriptor. An open questions, optimize for: 1. encoding/decoding speed? 2. ease of access? 3. memory size? Memory size should be a priority since we send this data over the wire. We should promote ease of access so other languages can integrate towards the format. This leads to a flat and simple structure. It follows that encoding/decoding takes a performance hit. Motivation ========== Why would we need maps when we have *records*, *dicts*, *gb_trees*, *ets* and *proplists*? Maps are envisioned to be an easy to use, lightweight yet powerful key-value association store. Maps utilizes one of Erlang's major strengths, pattern matching, to enrich user experience and provide a powerful tool to simplify code development. Pattern matching gives Maps a clear edge over dicts, gb_trees or proplists in usability. Maps provides the possibility to associate arbitrary terms as keys, not only atoms, with arbitrary terms as values in a matching capable data-type. Maps does not claim to be an replacement to records as the frames proposal does. Instead maps targets a larger usage domain and wishes to be a complement to records and supersede them where suitable. ### Maps - Two approaches ### 1. Maps as an association array with pattern matching and syntax for constructing, accessing or updating them. 2. Maps as a record replacement. Maps were not envisioned as a record replacement at first, it was a hopeful requirement added later. A record replacement approach does not necessarily restrain any semantics but it may put some constraint on the implementation and underlying structure. ##### Records ##### Records are powerful under the right circumstances: * fast lookups, O(1), due to compile time indexing of keys, and fast stores for small record sizes (~50 values), * no memory overhead to store keys, only values and a name: 2 + N words consumption, * ease of use in function head matching. However some of the drawbacks are: * compile-time dependency and forces header file inclusions for inter-module usage, * only atoms as keys, * keys are not accessible in runtime, * no dynamic access of values, i.e. we cannot use variables to access values, * it is not a data-type and cannot be distinguished from tuples. When comparing maps with records the drawbacks are easily remedied by Maps, however the positive effects is not as easy to replicate in a built-in data-type where values are determined at runtime instead of at compile time. * Being faster than direct-indexing array, where indices and possibly the resulting value are determined at compile time, is hard. In fact it is impossible. * A memory model for Maps where the efficiency was near that of records could be achieved by essentially using two tuples, one for keys and one for values as demonstrated in Frames. This would be impact performance of updates on Maps with a large number of entries and thus constrain the capability of a dictionary approach. * Maps would be as easy, or even easier, to use with matching in function heads. ### Protocol Construction ### Arguments for a simpler a JSON representation using frames or maps has been raised. Using frames and thereby atoms for dynamic creation of keys would be a serious drawback. Using maps would grant the possibility of string binaries to represent keys and would not put the global atom pool in disarray. #### Pattern Matching Example: The JSON Files #### Changing json-decoding. Mochiwebs mochijson decodes Json dictionaries the as the following: {"key": "value"} -> {struct, [{"key", "value"}]} This could instead be: {"key": "value"} -> #{ "key" => "value"} Consider the following JSON examples, from [json.org][json]. {"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }} `mochijson:decode/1` will currently look like: {struct, [ {"menu", {struct, [ {"id","file"}, {"value","File"}, {"popup", {struct, [ {"menuitem", {array, [ {struct, [{"value","New"},{"onclick","CreateNewDoc()"}]}, {struct, [{"value","Open"},{"onclick","OpenDoc()"}]}, {struct, [{"value","Close"}, {"onclick","CloseDoc()"}]} ]}} ]}} ]}} ]} `mochijson:decode/1` could look like: #{ "menu" => #{ "id" => "file", "value" => "File", "popup" => #{ "menuitem" => [ #{ "value" => "New", "onclick" => "CreateNewDoc()"}, #{ "value" => "Open", "onclick" => "OpenDoc()"}, #{ "value" => "Close", "onclick" => "CloseDoc()"} ] } }} Let us find `"menu"` -> `"popup"` -> `"menuitem"`. Traversing the first structure is a bit awkward. We would have to do the following: Decoded = mochijson:decode(Json), {struct, Menu} = proplists:get_value("menu", Decoded), {struct, PopUp} = proplists:get_value("popup", Menu), {struct, Items} = proplists:get_value("menuitem", PopUp), With maps it could look like the following: #{ "menu" := Menu } = mochijson:decode(Json), #{ "popup" := PopUp } = Menu, #{ "menuitem" := Items } = PopUp, or even: Decoded = mochijson:decode(Json), #{ "menu" := #{ "popup" := #{ "menuitem" := Items } } } = Decoded, With maps, and single value access, it could look really simple: Decoded = mochijson:decode(Json), Items = Decoded#{ "menu" }#{ "popup" }#{ "menuitem "}. ### Open Questions ### We have some usage scenarios that are still open for debate. A proposed answer is given for each question and stems from discussions in this proposal. 1. What type of keys will we need to store in our Map? Will atoms suffice? * It is the authors view that we should refrain from any key restrictions unless there is overwhelmingly evidence that we can gain something from such a restriction. * A non atom key restriction satisfies our goal of a powerful Map mechanism. * Proposal: *Any term as keys.* 2. How many key-value associations will we store in our map? * This question has less to do with syntax and semantics than it has with the choice of the underlying implementation. * If we enable the user to add key-value pairs dynamically surely he will use it. Since it is a powerful mechanism not afforded to us by records the usage pattern will also be different. This will in all likelihood produce larger Maps than records are today. This implies that we cannot compare records sizes with that of maps sizes since the usage scenario would be different. * Proposal: *Arbitrary number of keys-value pairs.* 3. How many Map instances will we create for each Map with a specific set of keys? * This question is closely related to how we use records and if Maps should emulate this behavior and this should have no impact on semantics, only implementation. * The significant difference is the memory overhead in the storing structure. Since memory overhead for keys and values has the same behavior as in any compounded term or abstract data-type, i.e. dict or gb_trees, the main difference occurs when comparing maps to records and frames. To ensure a logarithmic worst-case performance in update or retrieval some sort tree structure would likely be used for maps. Maps would then stores keys together with its values whereas frames stores keys outside its value structure and records generates key indexes at compile-time. This would indicate a memory overhead for Maps over Frames and records for each instance. * Proposal: *Two tier approach, similar to binaries. Use flat compact, key-sharing approach for few associations (~50 associations). Use sorted tree approach and store keys with values beyond first tier limit. The rationale being it is more likely to have multiple instance where we have few keys.* 4. Only allow updates of already defined keys within a Map in syntax? * The question stems from a record replacement approach and the argument for it is to mitigate typos, i.e. trying to update key `behavior` where key `behaviour` was actually intended. Instead of getting two different keys, a runtime exception occurs at this point. * This approach will *deny* any dictionary like behavior, for instance storing spawned processes as keys in the map using syntax. * Proposal: *Allow for any key to be stored by default syntax, existing or not, and use a special syntax for setting of values of existing keys only.* The answers from these questions are instrumental to how we should design and implement Maps. What we also should keep in the back of our minds is that we will never get rid of records completely and some of the frames arguments might thus be superfluous. Rationale ========= What should we expect from our syntax? -------------------------------------- As stated earlier, the current syntax is not set in stone but what should we expect from it? 1. First and foremost it has to be unambiguous, meaning the syntax must produce single clearly defined behavior that cannot be misinterpreted by humans nor machines. 1.1 Here we also include the notion that similar syntax should have similar behavior, or at least not completely different behavior. For example records, `#record{ key = Value }`, have O(1) performance and 2 + N words memory overhead. If we use a similar syntax, i.e. `#{ key = Value }`, we should also expect similar behavior both in regard to semantics and performance for any and all sizes of the map. 2. The syntax must be as short as possible. It must not use more characters than necessary to describe a certain behavior as long as it does not violate rule 1. 2.2 We want to avoid verbosity in the language. Verbosity pushes away information from our field of vision and obfuscates it. This needs to be avoided. Syntax choice for Maps ---------------------- *The author argues for:* `=>` *as delimiter in 'set-or-update' and* `:=` *in 'set-existing' and 'matching' syntax.* In the examples below we use `#{ Key => Value }` to describe map semantics and use-cases, but this is only one suggestion out of many. Several syntax proposals exists, frames proposes `<{ key ~ Value }>` syntax and another syntax suggestion is very similar to record syntax `#{ key = Value }`. The current variable and atom definitions puts restrictions on what we can use as delimiters and leaves us `~` and `=` as the only sane *single* character delimiters we can use. The case is very well argued in Richard O'Keefes [No more need for records (fifth draft)][ROKframe]. ### Delimiter discussion ### Arguments against a `=` delimiter are: * It lacks distinction from match, consider `#{ A = B = v }`, does `A` match `B` or does `B` match `v`? Which `=` is a match operation and which `=` delimits the key-value pair? * It might be interpreted as `Key` 'equal' `Value`, and hence `=` is in violation of rule #1 from *What do we expect from our syntax?*. The interpretation of this syntax is ambiguous. Arguments against a `~` delimiter are: * it might be interpreted as `Key` 'NOT' `Value` as `~` is the bitwise NOT operator in C, * it might be interpreted as `Key` 'about equal' `Value` as `~` is similar to mathematics `≃`, * it lacks typographical distinction, i.e. it lacks distinction from `-` in certain fonts, ex. `K ~ V` might be interpreted as `K - V`, consider `#{ K - 1 ~ 2 - V }`, and hence this is in violation of rule #1 from *What do we expect from our syntax?*. The interpretation of this syntax is ambiguous. Two two-character delimiter suggestions are `#{ Key := Value }` and `#{ Key => Value}`, where `:=` is a common denominator for assignment and `=>` should be read as *maps to*. A two-character delimiter should be avoided if at all possible since it increases the syntax footprint of the source code. The assignment delimiter reads well for just assignment but suffers from the same reversed logic flaw as records when it comes to pattern matching. The match `#{ key := Value } = M` reads *match M to the map pattern where Value is equal to key*. That does not read well unless we call the assignment delimiter `:=` for something that its not meant to be. However, `:=` is also similar to `=:=` which means "is exactly equal", i.e. matches. This is a valuable meaning since we have a difference between `==` and `=:=` when dealing with numbers and thus `:=` could be a more correct delimiter for matching syntax. The delimiter `->` would be suitable choice if it weren't for the fact that it would overload the function clause meaning. Both `->` and `=>` might be confusing when dealing with binaries. Consider `#{ <<"key">> -> <<"value">> }` and `#{ <<"key">> => <<"value">> }`, where `=>` appears to be slightly more confusing than `->`. Listing of delimiters from above perceived desirability: 1. `#{ K => V }` - No ambiguity, no overloading, reads as an association 2. `#{ K := V }` - No ambiguity, no overloading, reads as an assignment or exact match 3. `#{ K ~ V }` - Typographical ambiguity, no overloading, no clear meaning 4. `#{ K -> V }` - Overloads function clause head and body separator, reads as an association 5. `#{ K = V }` - Overloads match operator, reads as a match or an assignment Using `:=` assignment for existing keys seems as a good choice. The choice for set-or-update is between `=>` and `~`. ### The case for two set-or-update semantics and its syntax ### A case for two different ways to update values in a Map is proposed. One syntax if, and only if, we want to update a value for an already *existing* key and another if we want to update the Map with any key. * Use `M#{ K => V }` to declare new key value pairs *or* update already existing keys * Use `M#{ K := V }` to update already existing keys. * Use ` #{ K := V } = M` to match maps. Example 1: foo() -> M = #{ key1 => 1, key2 => 2 }, % M is declared with keys 'key1' and 'key2' bar(M). bar(M) -> M#{ key1 := "1", %% 'key1' will be set to "1" key2 := "2", %% 'key2' will be set to "2" key3 := "3" %% this causes an exception since 'key3' does not exist in M }. > foo(). ** exception error: no match of 'key3' in map Example 2: foo() -> M = #{ key1 => 1, key2 => 2 }, % M is declared with keys 'key1' and 'key2' bar(M). bar(M) -> M#{ key1 => "1", %% 'key1' will be set to "1" key2 => "2", %% 'key2' will be set to "2" key3 => "3" %% 'key3' will be set to "3" }. > foo(). #{ key1 => 1, key2 => "2", key3 => "3" } Impact of syntax footprint -------------------------- We must lessen the syntax footprint impact on the source code and the language. Currently the two normal ways of sending options to a functions are either via records or property lists. Both have some drawbacks. Records are compile time dependent and syntactic sugar for tuples. Property lists are generic but produces a lot of texts when defining them and operating on them. Consider this example when parsing a list of arguments: args(Args) -> args(Args, [{analyze, false}, {suppression, false}, {target, none}]). args(["-r" | Args], Opts) -> args(Args, [{analyze, true} | proplists:delete(analyze, Opts)]); args(["-s="++File | Args], Opts) -> args(Args, [{suppression, File} | proplists:delete(suppression, Opts)]); args([Target], Opts) -> [{target, Target} | proplists:delete(target, Opts)]. The textual impact, the number of characters, is quite heavy when operating on property lists. If we instead use some kind of map with syntax, how would that look? args(Args) -> args(Args, #{ analyze => false, suppression => false, target => none}). args(["-r" | Args], Opts) -> args(Args, Opts#{ analyze := true }); args(["-s="++File | Args], Opts) -> args(Args, Opts#{ suppression := File}); args([Target], Opts) -> Opts#{ target := Target }. This looks cleaner in my opinion but that is a very subjective view. To use some data we can count the characters, and we see that the property lists example has 390 characters versus the map examples 306. Property lists uses almost 30% more characters in this example. Semantics and API-functions --------------------------- ### List conversions ### Perhaps the most sane `maps:from_list/1` semantics would be to have the key-value significance order in left to right, meaning the first association is used and the latter values with equal keys are ignored. This differs from the `dict:from_list/1` behavior. Consider the following `dict` example: [{a,2}] = dict:to_list(dict:from_list([{a,1}, {a,2}])). By letting the leftmost be the most significant key we could simplify conversion from and to lists. Current suggestion has the following semantics: Ls = [{a,old}], #{ a := old } = maps:from_list([{a,new}|Ls]). The reversal would be: Ls = [{a,old}], #{ a := new } = maps:from_list([{a,new}|Ls]). Equality and Ordering --------------------- A restriction set on the implementation by the Erlang specification is that order is total, i.e. satisfies *antisymmetry*, *transitivity* and *totality*. - If `M1 =< M2` and `M2 =< M1` then `M1 == M2`, - If `M1 =< M2` and `M2 =< M3` then `M1 =< M3`, - If `M1 =< M2` or `M2 =< M1` (always comparable) where `M1`, `M2` and `M3` are any Map term. This only holds true in Erlang if we treat floats and integers as union of types, namely numbers. In the case of a Maps, `true = #{ 1.0 => V } == #{ 1 => V}`. - The need for order arises in a few cases. - comparison, for example sorting, `lists:sort([M1, .., Mn])` - introspection, for example when printed. - Ordered maps impose restrictions on the underlying implementation and a hashing approach will be nearly impossible. - The underlying structure does not need to be sorted, an order could be produced when needed, - `M1` < `M2`, would result in an internal sort but would cost O( *N1* * lg *N1* + *N2* * lg *N2* ), where `N1 = maps:size(M1) and N2 = maps:size(M2)` The discussion and results on equality versus matching for keys stems from ets and ordered set. Consider the following with ets. 1> ets:new(foo, [named_table, ordered_set]). foo 2> ets:insert(foo, {1,v1}). true 3> ets:insert(foo, {1.0,v2}). true 4> ets:lookup(foo,1). [{1.0,v2}] 5> ets:match(foo, ets:fun2ms(fun({1,V}) -> V end)). [] Accessing a single value ------------------------ Do we need to have single access or is matching sufficient? Consider the following, V = M#{ K } is shorter than #{ K := V } = M It also allows for easy access of associated values in deep structures. The syntax for single value access is the least developed (and contemplated) feature in this proposal and certainly could use some input. More over, the dot syntax must be abolished. Currently it is used for records but it will not be used for maps. Dot represents end of expression list in last clause, or end of attribute. It cannot be used to distinguish between floats or associations. Example: 1> M = #{ 1.1 => a, 1 => #{ 1 => b } }. #{ 1 => #{ 1 => b }, 1.1 => a }. 2> #M.1.1. a | b ? Backwards Compatibility ======================= Erlang code written with Maps will only be parseable, loadable and executable on Erlang/OTP R17A and later releases of Erlang/OTP but not on previous releases. Erlang code written before Erlang/OTP R17A will be perfectly compatible, i.e. parseable, loadable and executable with these Maps changes. Distribution will not be backwards compatible. [perl-hashes]: http://perldoc.perl.org/perldata.html "perldata - perldoc.perl.org" [ruby-hashes]: http://ruby-doc.org/core-1.9.3/Hash.html "Class: Hash (Ruby 1.9.3)" [python-dict]: http://docs.python.org/tutorial/datastructures.html#dictionaries "5. Data Structures - Python v2.7.3 documentation" [scala-maps]: http://docs.scala-lang.org/overviews/collections/maps.html "Collections - Maps - Scala Documentation" [dict-module]: http://www.Erlang.org/doc/man/dict.html "Erlang -- dict" [ROKframe]: http://www.cs.otago.ac.nz/staffpriv/ok/frames.pdf "No more need for records" [erlspec]: http://www.Erlang.org/download/erl_spec47.ps.gz "Erlang specification 4.7" [json]: http://json.org/example.html "JSON Example" Copyright ========= This document has been placed in the public domain.