Author: Björn-Egil Dahlberg
Status: Draft
Type: Standards Track
Created: 04-Apr-2013
Erlang-Version: R17A
Post-History:
EEP XXX: Maps
---
Abstract
========
The journey of Maps and this EEP has been long and by no means a
straight-forward and continuous one. I had a crystal clear picture of what I
wanted Maps to be when we first started discussing it within OTP about
two-three years ago. This EEP resembles that vision but it has had a lot of
contributions of other ideas from both within and outside of OTP.
The idea was a data-type, a syntax aware mapping of key-value associations
with pattern matching. A syntax similar to records but without the hazzle of
compile-time dependency and with arbitrary terms as keys. Order was not
important and it could be implemented with a Hash-Array-Mapped-Trie with good
performance and memory trade-offs. This was a different approach than to replace
records. It was meant to replace records where suitable and in other regards
not be a replacement but its own *thing*.
>From the community there has been many wishes of a Map like data-type and a
few suggestions. The one suggestion that stands out is of course the Frames
proposal from Richard O'Keefe. It is the most complete proposal I've seen and
is very well thought out. Its goal is to be a record replacement and the
proposal satisfies this goal very well.
- If Frames are that good, why a separate EEP?
- It boils down to goals and constraints.
A record replacement is just that, a replacement.
It's like asking the question, "What do we have?" instead of "What can we get?"
The instant rebuttal would be "What do we need?" I say Maps.
Frames has certainly inspired and influenced Maps. In many regards Maps also
encompasses Frames but Maps tries to do more. In the end I believe they are
two different things and have different goals.
This EEP suggests a new built in data-type for Erlang, the map,
`#{ Key => Value }`.
The new data-type shall have semantics, syntax and operations that:
* provides an association set from key terms to value terms which can be
constructed, accessed and updated using language syntax
* can be uniquely distinguished from every other data-type in the language
* has no compile-time dependency for constructing, accessing or updating
contents of maps nor for passing maps between modules, processes or over
Erlang distribution
* can be used in matching expressions in the language
* has a one-to-one association between printing and parsing the data-type
* has a well defined order between terms of the type and other Erlang types
* has at most O(log N) time complexity in insert and lookup operations, where
N is the number of key-value associations.
Similar data-types exists in other languages, i.e. [perl hashes][perl-hashes],
[ruby hashes][ruby-hashes], [python dictionaries][python-dict], or
[scala maps][scala-maps].
Specification
=============
A map `M` consists of a number of *associations* and keeps an association
from key terms `K1..Kn` to value terms `V1..Vn` where no two keys are *equal*.
Any term, compounded or otherwise, is a viable key or value. Terms of type Map
are recognized by guard tests `erlang:is_map/1`. There are no operators
acting on maps. Within maps there are two infix operators. An association
operator, `=>`, pairs a key to a value and is used in creation and updates.
A set-value operator, `:=`, is used to update a value on an already
existing and matching key. The set-value operator is also used in matching
to get the associated value from a key.
## Terminology ##
The *size* of a map is the number of associations in its set.
An *association* is a key-value pair of key *K* to value *V* in a Map.
Two keys, `K1` and `K2` are *equal* if, `true = K1 == K2`.
Two keys, `K1` and `K2` are *matching* if, `true = K1 =:= K2`.
Syntax
------
Defined syntax for declaring, updating and matching maps.
### Construction syntax ###
Constructing a new map is done by letting an expression `K` be associated to
another expression `V`:
#{ K => V }
New maps may include multiple associations at construction by listing every
association:
#{ K1 => V1, .. Kn => Vn }
An empty map is constructed by not associating any terms with each other:
#{}
All keys and values in the map are terms. Any expression is first evaluated and
then the resulting terms are used as *key* and *value* respectively.
Keys and values are separated by the `=>` arrow and associations are
separated by `,`.
Examples:
M0 = #{}, % empty map
M1 = #{ a => <<"hello">> }, % single association with literals
M2 = #{ 1 => 2, b => b }, % multiple associations with literals
M3 = #{ A => B }, % single association with variables
M4 = #{ {A, B} => f() }. % compound key associated to an evaluated expression
where, `A` and `B` are any expressions and `M0` through `M4` are the resulting
map terms.
If two equal keys are declared, the latter key will take precedent.
Example:
1> #{ 1 => a, 1 => b }.
#{ 1 => b }
2> #{ 1.0 => a, 1 => b }.
#{ 1 => b }
The order in which the expressions constructing the keys and their
associated values are evaluated is not defined. The syntactic order of
the key-value pairs in the construction is of no relevance, except in
the above mentioned case of two keys comparing equal.
A simple BNF grammar for the construction follows:
::= '#{' '}'
::= /* empty */
|
::=
| ','
::= '=>'
::=
### Update syntax ###
Updating a map has similar syntax as constructing it.
An expression defining the map to be updated is put in front of the expression
defining the keys to be updated and their respective values.
M#{ K => V }
where `M` is an term of type map and `K` and `V` are any expression.
If key `K` does not *equal* any existing key in the map, a new association
will be created from key `K` to value `V`. If key `K` is *equal* to an existing
key in map `M` its associated value will be replaced by the new value `V` *and*
the new key `K` will replace the old key if the key does not *match*. In both
cases the evaluated map expression will return a new map.
If `M` is not of type map an exception of type `badmap` is thrown.
To only update an existing value, the following syntax is used,
M#{ K := V }
where `M` is an term of type map, `V` is an expression and `K` is an expression
which evaluates to an existing key in `M`.
If key `K` does not *match* any existing keys in map `M` an exception of type
`badarg` will be triggered at runtime. If a *matching* key `K` is present in
map `M` its associated value will be replaced by the new value `V` and the
evaluated map expression returns a new map.
If `M` is not of type map an exception of type `badmap` is thrown.
Examples:
M0 = #{},
M1 = M0#{ a => 0 },
M2 = M1#{ a => 1, b => 2 },
M3 = M2#{ "function" => fun() -> f() end },
M4 = M3#{ a := 2, b := 3 }. % 'a' and 'b' was added in `M1` and `M2`.
where `M0` is any map. It follows that `M1 .. M4` are maps as well.
More Examples:
1> M = #{ 1 => a },
#{ 1 => a }
2> M#{ 1.0 => b }.
#{ 1.0 => b }.
3> M#{ 1 := b }.
#{ 1 => b }
4> M#{ 1.0 := b }.
** exception error: bad argument
As in construction, the order in which the key and value expressions
are evaluated are not defined. The
syntactic order of the key-value pairs in the update is of no
relevance, except in the case where two keys compare equal, in which
case the latter value is used.
A simple BNF grammar for map updates follows:
::= '#{' '}'
::= /* empty */
|
::=
| ','
::=
|
::= '=>'
::= ':='
::=
::=
### Accessing a single value ###
For accessing single values in maps, let us use an de-association:
V = M#{ K }.
Where `M` is a Map and `K` is any term.
If key `K` is *equal* to an existing key in map `M` the associated value
will be bound to `V`. If key `K` does not *equal* to any existing key in
map `M` an exception `badarg` will occur in runtime.
Examples:
M1 = #{ a => 1, c => 3 },
3 = M1#{ c }.
M2 = #{ 1.0 => a },
a = M2#{ 1 }.
### Matching syntax ###
Matching of key-value associations from maps, exemplified with the
matching operator, is done in the following way:
#{ K := V } = M
where `M` is any map. The key `K` has to be an expression with bound variables
or a literals, and `V` can be any pattern with either bound or unbound variables.
If variables in `V` are unbound, it will be bound to the value associated
with the key `K`, which has to exist in the map `M`. If variables in `V` are
bound, it has to match the value associated with `K` in `M`.
Example:
M = #{ a => {1,2}},
#{ a := {1,B}} = M.
This will bind variable `B` to integer `2`.
Similarly, multiple values from the map may be matched:
#{ K1 := V1, .., Kn := Vn } = M
where keys `K1 .. Kn` are any expressions with literals or bound variables. If all
keys exists in map `M` all variables in `V1 .. Vn` will be matched to the
associated values of there respective keys.
If the matching conditions are not met, the match will fail, either with
1. a `badmatch` exception, if used in the context of the matching operator
as in the example,
2. or resulting in the next clause being tested in function heads and
case expressions.
Matching in maps only allows for `:=` as delimiters of associations.
The order in which keys are declared in matching has no relevance.
Duplicate keys are allowed in matching and will match each pattern associated
to the keys.
#{ K := V1, K := V2 } = M
Matching an expression against an empty map literal will match its type but
no variables will be bound:
#{} = Expr
This expression will match if the expression `Expr` is of type map, otherwise
it will fail with an exception `badmatch`.
The grammar for the matching syntax is similar to that of construction.
#### Matching syntax: Example with literals in function heads ####
Matching of literals as keys are allowed in function heads.
%% only start if not_started
handle_call(start, From, #{ state := not_started } = S) ->
...
{reply, ok, S#{ state := start }};
%% only change if started
handle_call(change, From, #{ state := start } = S) ->
...
{reply, ok, S#{ state := changed }};
#### Matching syntax: Frequency example ####
More matching syntax, calculating frequency of terms in a list.
freq(Is) -> freq(Is, #{}).
freq([I|Is], #{I := C} = M) -> freq(Is, M#{ I := C + 1});
freq([I|Is], M) -> freq(Is, M#{ I => 1 });
freq([], M) -> maps:to_list(M).
Equivalent code with `gb_trees` for comparison:
freq(Is) -> freq(Is, gb_trees:empty()).
freq([I|Is], T) ->
case gb_trees:lookup(I, T) of
none -> freq(Is, gb_trees:enter(I, 1), T);
{value, V} -> freq(Is, gb_trees:enter(I, V + 1, T))
end;
freq([], T) -> gb_trees:to_list(T).
#### Matching syntax: File information example ####
Old API's could be refined to use map syntax:
1> {ok, #{ type := Type, mtime := Mtime }} = file:read_file_info(File).
2> io:format("type: ~p, mtime: ~p~n", [Type, Mtime]).
type: regular, mtime: {{2012,7,18},{19,59,18}}
ok
3>
### Map comprehension syntax ###
Map comprehension declaration:
M1 = #{ E0 => E1 || K := V <- M0 }
where `M0` is any Map, `E0` and `E1` are any erlang expression, `K` and `V`
constitutes the pattern to be matched by each association in `M0`.
For each sequence in the generator an association is created from the evaluated
expression `E0` to the evaluated expression `E1`.
If `M0` is not a Map, then a runtime exception of type `{bad_generator, M0}`
will be generated.
A simple BNF grammar for map comprehension follows:
::= '#{' '||' '}'
::=
| ','
::=
|
::= '<-'
::=
::= '=>'
::= ':='
::=
Each association from all generators, which satisfies the filters, has an
environment that consist of the initial environment and the environment
for the association.
Examples:
M0 = #{ K => V*2 || K := V <- map() },
M1 = #{ I => f(I) || I <- list() },
M2 = #{ K => V || <> <= binary() }.
Map generators may also be used in binary and list comprehensions.
Examples:
B1 = << <> || _ := V <- map() >>,
L1 = [ {K,V} || K := V <- map() ].
### Dialyzer and Type specification ###
Keys known before hand can be specified directly and uniquely for a map.
-spec func(Opt, M) -> #{ 'status' => S, 'c' => integer() } when
Opt :: 'inc' | 'dec',
M :: #{ 'status' => S, 'c' => integer() },
S :: 'update' | 'keep'.
func(inc, #{ status := update, c := C} = M) -> M#{ c := C + 1};
func(dec, #{ status := update, c := C} = M) -> M#{ c := C - 1};
func(_, #{ status := keep } = M) -> M.
It could also be specified by type only.
-spec plist_to_map(Ls) -> #{ binary() => integer() } when
Ls :: [{binary(), integer()}].
plist_to_map([], M) ->
M;
plist_to_map([{K,V}|Ls], M) when is_binary(K), is_integer(V) ->
plist_to_map(Ls, M#{ K => V });
plist_to_map([_|Ls], M) ->
plist_to_map(Ls, M).
It can similarly be specified as a type.
-type map1() :: #{ binary() => integer() }.
-type map2() :: #{ <<"list1">> | <<"list2">> => [numbers()] }.
Functions and Semantics
-----------------------
*The module implementing the functions is currently named in plural, `maps` in the
same spirit as `lists`, `gb_trees`, `sets` etc but the singular `map` is shorter
and may be more desirable.*
Functions and semantics for maps. Originally much inspiration was from
Richard O'Keefes frames proposal.
### Erlang module extension ###
##### `erlang:is_map(M :: term()) -> boolean().`
This function is a guard function.
Syntax equivalence: `try #{} = M, true catch error:{badmatch,_} -> false end`.
The function returns `true` if M is a map otherwise `false`.
##### `erlang:map_size(M :: map()) -> non_neg_integer().`
This function is a guard function.
Syntax equivalence: *none*.
The function returns the number of key-value pairs in the map.
Same as, `maps:size(M)` and `length(maps:to_list(M))`.
### maps module ###
##### `maps:remove(K0 :: term(), M0 :: map()) -> M1 :: map().`
Syntax equivalence: `#{ K => V || K := V <- M0, K =/= K0 }`.
*Only with comprehensions*
The function removes the key `K0`, if it exists, and its associated value from
map `M0` and returns a new map `M1` without key `K0`.
Same as, `maps:from_list([{K,V}||{K,V} <- maps:to_list(M0), K =/= K0])`
##### `maps:get(K :: term(), M :: map()) -> V :: term().`
Syntax equivalence: `M#{ K }`.
Returns the value `V` associated with key `K` if map `M` contains a key
that *equals* `K`. If no value is associated with key `K` then the call will
fail with an exception.
Same as,`begin {value,{_,V}} = lists:keysearch(K, 1, maps:to_list(M)), V end`,
except for errors.
##### `maps:keys(M :: map()) -> [K1, .., Kn].`
Syntax equivalence: `[K || K := _ <- M]`.
Returns a complete list of Keys, in sorted order, which resides within
map `M`.
Same as, `[K || {K,_} <- maps:to_list(M)]`.
##### `maps:find(K :: term(), M :: map()) -> {ok, V :: term()} | error.`
Syntax equivalence: `try #{ K := V } = M, V catch error:{badmatch,_} -> error end`.
Returns a tuple `{ok, V}` with value `V` associated with key `K` if map `M`
contains key `K`. If no value is associated with key `K` then the function
will return `error`.
Same as, `case lists:keysearch(K, 1, maps:to_list(M)) of {value,{_,V}} -> {ok,V}; false -> error end`.
##### `maps:foldl(F :: fun((K :: term(), V :: term(), In :: term()) -> Out :: term()), A :: term(), M :: map()) -> Result :: term().`
Syntax equivalence: *none*
Calls `F(K, V, In)` for every key `K` to value `V` association in map `M` in
ascending key order. The function fun `F/3` must return a new accumulator
which is passed to the next successive call. `maps:foldl/3` returns the final
value of the accumulator. The initial accumulator value `A` is returned if
the map is empty.
Same as, `lists:foldl(fun({K,V}, In) -> F(K,V,In) end, A, maps:to_list(M))`.
##### `maps:foldr(F :: fun((K :: term(), V :: term(), In :: term()) -> Out :: term()), A :: term(), M :: map()) -> Result :: term().`
Syntax equivalence: *none*
Calls `F(K, V, In)` for every key `K` to value `V` association in map `M` in
descending key order. The function fun `F/3` must return a new accumulator
which is passed to the next successive call. `maps:foldr/3` returns the final
value of the accumulator. The initial accumulator value `A` is returned if
the map is empty.
Same as, `lists:foldr(fun({K,V}, In) -> F(K,V,In) end, A, maps:to_list(M))`.
##### `maps:from_list([{K1,V1}, .., {Kn,Vn}]) -> M :: map().`
Syntax equivalence: `#{ K1 => V1, .., Kn => Vn }`
The function takes a list of key-value tuples elements and builds a
map. The associations may be in any order and both keys and values in the
association may be of any term. If the same key appears more than once,
the latter (rightmost) value is used and the previous values are ignored.
##### `maps:is_key(K :: term(), M :: map()) -> bool().`
Syntax equivalence: `try #{ K := _ } = M, true catch error:{badmatch, _} -> false end`.
Returns `true` if map `M` contains key `K`, otherwise it returns `false`.
Same as, `lists:keymember(K, 1, maps:to_list(M))`.
##### `maps:map(F :: function(), M0 :: map()) -> M1 :: map().`
Syntax equivalence: `#{ K => F(K,V) || K := V <- M }`.
*Only with comprehensions*
The function produces a new map `M1` by calling the function fun `F(K, V)` for
every key `K` to value `V` association in map `M0` in arbitrary order.
The function fun `F/2` must return the value to be associated with key `K` for
the new map `M1`.
Same as, `maps:from_list(lists:map(fun({K,V}) -> {K, F(K,V)} end, maps:to_list(M)))`.
##### `maps:new() -> M :: map().`
Syntax equivalence: `#{}`.
Returns a new empty map.
Same as, `maps:from_list([])`.
##### `maps:size(M :: map()) -> Size :: non_neg_integer().`
Syntax equivalence: *none*.
The function returns the number of key-value associations in the map.
Same as, `erlang:map_size(M)` and `length(maps:to_list(M))`.
##### `maps:put(K :: term(), V :: term(), M0 :: map()) -> M1 :: map().`
Syntax equivalence: `M0#{ K => V }`.
Associates key `K` with value `V` and inserts the association into map `M0`.
If a key exists that *equals* `K`, the old associated value is
replaced by value `V`. If key `K` does not *match* but *equals* the
old key is also replaced by key `K`.
The function returns a new map `M1` containing the new association.
Same as, `maps:from_list(maps:to_list(M0) ++ [{K,V}])`.
##### `maps:to_list(M :: map()) -> [{K1,V1}, ..., {Kn,Vn}].`
Syntax equivalence: `[{K, V} || K := V <- M]`.
Where the pairs, `[{K1,V1}, ..., {Kn,Vn}]`, are returned in a sorted order.
The Map ordering is described later in this document.
##### `maps:update(K :: term(), V :: term, M0 :: map()) -> M1 :: map()`
Syntax equivalence: `M0#{ K := V }`.
If a key exists that *matches* `K`, the old associated value is
replaced by value `V`. The function returns a new map `M1` containing
the new associated value.
Same as, `maps:from_list(maps:to_list(M0) ++ [{K,V}])`.
##### `maps:values(M :: map()) -> [V1, .., Vn].`
Syntax equivalence: `[V || _ := V <- M]`.
Returns a complete list of values, in key sorted order, contained in map `M`.
Same as, `[V || {_,V} <- maps:to_list(M)]`.
##### `maps:without([K1, .., Kn] = Ks, M0 :: map()) -> M1 :: map().`
Syntax equivalence: `#{ K => V || K := V <- M0, not lists:member(K, Ks) }`.
*Only with comprehensions*
Removes keys `K1` through `Kn`, and their associated values, from map `M0` and
returns a new map `M1`.
Same as, `maps:from_list([{K,V}||{K,V} <- maps:to_list(M0), not lists:member(K, Ks)])`.
##### `maps:merge(M0 :: map(), M1 :: map()) -> M2 :: map().`
Syntax equivalence: *none*
Merges two maps into a single map. If two *equal* keys exists in both maps the
value in map `M0` will be superseded by the value in map `M1`.
Same as, `maps:from_list(lists:keymerge(1,maps:to_list(M0),maps:to_list(M1)))`.
Equality and Ordering
---------------------
### Equality ###
In the case of term `A` and term `B` both are maps,
* If `A` and `B` have different sizes, then `A` and `B` are not equal.
* Otherwise, if all corresponding keys of `A` and `B` are pair-
wise equal with their corresponding values, then `A` and `B` are equal.
* Otherwise, `A` and `B` are not equal.
It follows that two maps are equal if, and only if, they are of
the same, *type*, *size* and all corresponding key-value associations are
pairwise equal.
### Ordering ###
The term order is defined in [Erlang specification 4.7][erlspec] and quoted
below:
> * The terms are primarily ordered according to their type, in the following order:
>
> numbers < atoms < refs < ports < PIDs < tuples < empty list < conses < binary
The specification is incomplete here, the actual term order is:
numbers < atoms < refs < funs < ports < pids < tuples < empty list < conses < binaries
The Maps data-type are ordered next after tuples:
numbers < atoms < refs < funs < ports < pids < tuples < maps < empty list < conses < binaries
----
Maps are then ordered first by their size and then according to their
respective keys and lastly by the associated values in Erlang term order.
Given two maps, `M1` and `M2`, with the same size, they are compared
so that each key, in Erlang term order of the keys, in `M1`
is compared to the corresponding key of `M2`. All keys are
compared first, then the values, until a difference is found. If a key
or value differs, the order of the respective terms, in Erlang term
order, is the order of the maps. If no key-value pairs differ, the
maps are considered equal.
Example:
> #{ b => 2 } > #{ a => 2 }. % b > a
true
> #{ b => 2 } > #{ a => 1, b => 2 }. % size 1 < size 2
false
> #{ b => 1 } > #{ b => 2}. % 1 < 2
false
> #{ b => 2, c => 3 } > #{ b => 1, d => 3}. % c > d, compared before 2 and 1
false
> #{ b => 1 } > #{ 1 => 1}. % b > 1
true
> #{ 1.0 => a } == #{ 1 => a }. % 1.0 == 1
true
> #{ 1.0 => a } =:= #{ 1 => a }. % 1.0 =:= 1
false
Maps are printed with keys in Erlang term order.
Example:
> #{ c => 3, b => 2, a => 1 }.
#{ a => 1, b => 2, c => 3 }
Operator Precedence
-------------------
Map association operator and set-value operator is ordered last,
after match-operator and `catch`.
:
#
Unary + - bnot not
/ * div rem band and Left associative
+ - bor bxor bsl bsr or xor Left associative
++ -- Right associative
== /= =< < >= > =:= =/=
andalso
orelse
= ! Right associative
catch
=> :=
It follows that the map expression:
#{ key => C = 1 + 2 }
will evaluate in the following order:
#{ key => ( C = ( 1 + 2 ) ) }
Pattern matching
----------------
Pattern matching is very powerful Erlang tool. Maps introduces a couple of new
features with its pattern matching.
### Pattern matching: Basics ###
We will exemplify using the match operator.
Pattern matching with maps is similar to records on the surface. Keys requested
in a LHS pattern will be bound with the values which is found in the
RHS association map.
1> #{ a := V } = #{ a => 1 }.
#{ a => 1 }
2> 1 = V.
1
Keys requested in a LHS pattern which is not found in the RHS map will produce
an exception, `exception error: no match of right hand side value ...`.
1> #{ not_in_map := V } = #{ a => 1 }.
** exception error: no match of right hand side value #{ a => 1 }
Similarly, if a value for a requested key in the LHS pattern
is not equal to the keys associated value in the RHS map
the match will produce an exception.
1> #{ a := 10 } = #{ a => 1 }.
** exception error: no match of right hand side value #{ a => 1 }
Only the keys requested will bind to associations. Any unrequested keys which
resides within the map being matched will be ignored.
1> #{ a := V1, b := V2 } = #{ a => 1, b => 2, c => 3}.
#{ a => 1, b => 2, c => 3 }
2> 1 = V1.
1
3> 2 = V2.
2
The order of keys requested has no significance when pattern matching a map.
1> #{ a := "1", b := "2" } = #{ a => "1", b => "2" }.
#{ a => "1", b => "2" }
2> #{ b := "2", a := "1" } = #{ a => "1", b => "2" }.
#{ a => "1", b => "2" }
### Pattern matching: Continued ###
The example below is a constructed example to illustrate the power of map
pattern matching.
A match expression is evaluated so that variables used as keys in the expression
are bound before they are evaluated (if possible).
As an example, keys can be bound by other key-value associations.
1> #{ K := V, id := K } = M = #{ id => b, a => 1, b => 2, c => 3}.
#{ id => b, a => 1, b => 2, c => 3}
2> b = K.
b
3> 2 = V.
2
In this case, the bound key `id` is evaluated first and looked up in
M, binding the variable `K`. The `K` bound to `b` can then be used to
bind `V` to 2.
Binding variables used as keys requires that there is a possible order of
binding without cycles. The reordering extends to all terms in a matching
expression, so that:
1> { #{ X := Y }, X } = { #{ 5 => 10 }, 5 }.
with `X` and `Y` unbound, results in a successful match binding `X` to 5 and
`Y` to 10.
This is particular useful when updating specifics in map associations:
%% Function declared in module map_example
update_values([{K, V1}|Ls], #{ K := V0 } = M) -> update_values(Ls, M#{ K := V0 + V1 });
update_values([_|Ls], M) -> update_values(Ls, M);
update_values([], M) -> M.
The first function clause is important here. Key `K` is bound in the tuple and
will be used to request value `V0` from map `M`. The map `M` is then updated to
associate key `K` with the new value `V0 + V1`.
%% In the Erlang shell
1> M = #{ "a" => 1, "b" => 2, "c" => 3 }.
#{ "a" => 1, "b" => 2, "c" => 3 }
2> map_example:update_values([{"b", 10}, {"c", 20}, {"d", 30 }], M).
#{ "a" => 1, "b" => 12, "c" => 23 }
Note that since key `"d"` does not reside in map `M` it fails to match the first
clause and does not update the map with the association `"d" => 40`.
An expression where the dependencies in the LHS of the match are cyclic, like:
1> #{X := Y, Y := X} = #{5 => 10, 10 => 5}.
will result in an evaluator error (variable is unbound) or a compilation error.
External Term Format
--------------------
There are 255 tags that can be used to encode terms for external binary
distribution. All tags are defined in `external.h`. The encoding starts by a
magic byte `131`.
The encoding tags used in R15B01 are the following:
SMALL_INTEGER_EXT 'a' 97
INTEGER_EXT 'b' 98
FLOAT_EXT 'c' 99
ATOM_EXT 'd' 100
SMALL_ATOM_EXT 's' 115
REFERENCE_EXT 'e' 101
NEW_REFERENCE_EXT 'r' 114
PORT_EXT 'f' 102
NEW_FLOAT_EXT 'F' 70
PID_EXT 'g' 103
SMALL_TUPLE_EXT 'h' 104
LARGE_TUPLE_EXT 'i' 105
NIL_EXT 'j' 106
STRING_EXT 'k' 107
LIST_EXT 'l' 108
BINARY_EXT 'm' 109
BIT_BINARY_EXT 'M' 77
SMALL_BIG_EXT 'n' 110
LARGE_BIG_EXT 'o' 111
NEW_FUN_EXT 'p' 112
EXPORT_EXT 'q' 113
FUN_EXT 'u' 117
DIST_HEADER 'D' 68
ATOM_CACHE_REF 'R' 82
ATOM_INTERNAL_REF2 'I' 73
ATOM_INTERNAL_REF3 'K' 75
BINARY_INTERNAL_REF 'J' 74
BIT_BINARY_INTERNAL_REF 'L' 76
COMPRESSED 'P' 80
For Maps we define tag `MAP_EXT` to 116 (`t`).
Data layout:
MAP_EXT
|-----------------------------------
| 1 | 4 | | |
|-----------------------------------
| 116 | Size | Keys | Values |
|-----------------------------------
The `Size` specifies the number of keys and values that follows the size
descriptor.
An open questions, optimize for:
1. encoding/decoding speed?
2. ease of access?
3. memory size?
Memory size should be a priority since we send this data over the wire. We
should promote ease of access so other languages can integrate towards the
format.
This leads to a flat and simple structure. It follows that encoding/decoding
takes a performance hit.
Motivation
==========
Why would we need maps when we have *records*, *dicts*, *gb_trees*, *ets*
and *proplists*?
Maps are envisioned to be an easy to use, lightweight yet powerful key-value
association store.
Maps utilizes one of Erlang's major strengths, pattern matching, to enrich user
experience and provide a powerful tool to simplify code development. Pattern
matching gives Maps a clear edge over dicts, gb_trees or proplists in usability.
Maps provides the possibility to associate arbitrary terms as keys, not only
atoms, with arbitrary terms as values in a matching capable data-type.
Maps does not claim to be an replacement to records as the frames proposal does.
Instead maps targets a larger usage domain and wishes to be a complement to
records and supersede them where suitable.
### Maps - Two approaches ###
1. Maps as an association array with pattern matching and syntax for
constructing, accessing or updating them.
2. Maps as a record replacement.
Maps were not envisioned as a record replacement at first, it was a hopeful
requirement added later. A record replacement approach does not necessarily
restrain any semantics but it may put some constraint on the implementation
and underlying structure.
##### Records #####
Records are powerful under the right circumstances:
* fast lookups, O(1), due to compile time indexing of keys, and fast stores for
small record sizes (~50 values),
* no memory overhead to store keys, only values and a name: 2 + N words
consumption,
* ease of use in function head matching.
However some of the drawbacks are:
* compile-time dependency and forces header file inclusions for inter-module usage,
* only atoms as keys,
* keys are not accessible in runtime,
* no dynamic access of values, i.e. we cannot use variables to access values,
* it is not a data-type and cannot be distinguished from tuples.
When comparing maps with records the drawbacks are easily remedied by Maps,
however the positive effects is not as easy to replicate in a built-in data-type
where values are determined at runtime instead of at compile time.
* Being faster than direct-indexing array, where indices and possibly the
resulting value are determined at compile time, is hard.
In fact it is impossible.
* A memory model for Maps where the efficiency was near that of records
could be achieved by essentially using two tuples, one for keys and one for
values as demonstrated in Frames. This would be impact performance of
updates on Maps with a large number of entries and thus constrain the
capability of a dictionary approach.
* Maps would be as easy, or even easier, to use with matching in function heads.
### Protocol Construction ###
Arguments for a simpler a JSON representation using frames or
maps has been raised. Using frames and thereby atoms for dynamic creation of keys
would be a serious drawback. Using maps would grant the possibility of string
binaries to represent keys and would not put the global atom pool in disarray.
#### Pattern Matching Example: The JSON Files ####
Changing json-decoding. Mochiwebs mochijson decodes Json dictionaries the as
the following:
{"key": "value"} -> {struct, [{"key", "value"}]}
This could instead be:
{"key": "value"} -> #{ "key" => "value"}
Consider the following JSON examples, from [json.org][json].
{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
`mochijson:decode/1` will currently look like:
{struct, [
{"menu", {struct, [
{"id","file"},
{"value","File"},
{"popup", {struct, [
{"menuitem", {array, [
{struct, [{"value","New"},{"onclick","CreateNewDoc()"}]},
{struct, [{"value","Open"},{"onclick","OpenDoc()"}]},
{struct, [{"value","Close"}, {"onclick","CloseDoc()"}]}
]}}
]}}
]}}
]}
`mochijson:decode/1` could look like:
#{ "menu" => #{
"id" => "file",
"value" => "File",
"popup" => #{
"menuitem" => [
#{ "value" => "New", "onclick" => "CreateNewDoc()"},
#{ "value" => "Open", "onclick" => "OpenDoc()"},
#{ "value" => "Close", "onclick" => "CloseDoc()"}
]
}
}}
Let us find `"menu"` -> `"popup"` -> `"menuitem"`.
Traversing the first structure is a bit awkward. We would have to do
the following:
Decoded = mochijson:decode(Json),
{struct, Menu} = proplists:get_value("menu", Decoded),
{struct, PopUp} = proplists:get_value("popup", Menu),
{struct, Items} = proplists:get_value("menuitem", PopUp),
With maps it could look like the following:
#{ "menu" := Menu } = mochijson:decode(Json),
#{ "popup" := PopUp } = Menu,
#{ "menuitem" := Items } = PopUp,
or even:
Decoded = mochijson:decode(Json),
#{ "menu" := #{ "popup" := #{ "menuitem" := Items } } } = Decoded,
With maps, and single value access, it could look really simple:
Decoded = mochijson:decode(Json),
Items = Decoded#{ "menu" }#{ "popup" }#{ "menuitem "}.
### Open Questions ###
We have some usage scenarios that are still open for debate. A proposed answer
is given for each question and stems from discussions in this proposal.
1. What type of keys will we need to store in our Map? Will atoms suffice?
* It is the authors view that we should refrain from any key restrictions
unless there is overwhelmingly evidence that we can gain something from
such a restriction.
* A non atom key restriction satisfies our goal of a powerful Map mechanism.
* Proposal: *Any term as keys.*
2. How many key-value associations will we store in our map?
* This question has less to do with syntax and semantics than it has with
the choice of the underlying implementation.
* If we enable the user to add key-value pairs dynamically surely he will
use it. Since it is a powerful mechanism not afforded to us by records
the usage pattern will also be different. This will in all likelihood
produce larger Maps than records are today. This implies that we cannot
compare records sizes with that of maps sizes since the usage scenario
would be different.
* Proposal: *Arbitrary number of keys-value pairs.*
3. How many Map instances will we create for each Map with a specific set of keys?
* This question is closely related to how we use records and if Maps should
emulate this behavior and this should have no impact on semantics, only
implementation.
* The significant difference is the memory overhead in the storing structure.
Since memory overhead for keys and values has the same behavior as in any
compounded term or abstract data-type, i.e. dict or gb_trees, the main
difference occurs when comparing maps to records and frames. To ensure a
logarithmic worst-case performance in update or retrieval some sort tree
structure would likely be used for maps. Maps would then stores keys
together with its values whereas frames stores keys outside its value
structure and records generates key indexes at compile-time. This would
indicate a memory overhead for Maps over Frames and records for each
instance.
* Proposal: *Two tier approach, similar to binaries. Use flat compact,
key-sharing approach for few associations (~50 associations). Use sorted
tree approach and store keys with values beyond first tier limit.
The rationale being it is more likely to have multiple instance where
we have few keys.*
4. Only allow updates of already defined keys within a Map in syntax?
* The question stems from a record replacement approach and the argument for
it is to mitigate typos, i.e. trying to update key `behavior` where key
`behaviour` was actually intended. Instead of getting two different keys,
a runtime exception occurs at this point.
* This approach will *deny* any dictionary like behavior, for instance
storing spawned processes as keys in the map using syntax.
* Proposal: *Allow for any key to be stored by default syntax,
existing or not, and use a special syntax for setting of values of
existing keys only.*
The answers from these questions are instrumental to how we should design and
implement Maps. What we also should keep in the back of our minds is that we
will never get rid of records completely and some of the frames arguments might
thus be superfluous.
Rationale
=========
What should we expect from our syntax?
--------------------------------------
As stated earlier, the current syntax is not set in stone but what should we
expect from it?
1. First and foremost it has to be unambiguous, meaning the syntax must produce
single clearly defined behavior that cannot be misinterpreted by humans
nor machines.
1.1 Here we also include the notion that similar syntax should have similar
behavior, or at least not completely different behavior.
For example records, `#record{ key = Value }`, have O(1) performance and
2 + N words memory overhead. If we use a similar syntax,
i.e. `#{ key = Value }`, we should also expect similar behavior both in
regard to semantics and performance for any and all sizes of the map.
2. The syntax must be as short as possible. It must not use more characters than
necessary to describe a certain behavior as long as it
does not violate rule 1.
2.2 We want to avoid verbosity in the language. Verbosity pushes away
information from our field of vision and obfuscates it.
This needs to be avoided.
Syntax choice for Maps
----------------------
*The author argues for:* `=>` *as delimiter in 'set-or-update' and*
`:=` *in 'set-existing' and 'matching' syntax.*
In the examples below we use `#{ Key => Value }` to describe map semantics and
use-cases, but this is only one suggestion out of many.
Several syntax proposals exists, frames proposes `<{ key ~ Value }>` syntax and
another syntax suggestion is very similar to record syntax `#{ key = Value }`.
The current variable and atom definitions puts restrictions on what we can use
as delimiters and leaves us `~` and `=` as the only sane *single* character
delimiters we can use. The case is very well argued in Richard O'Keefes
[No more need for records (fifth draft)][ROKframe].
### Delimiter discussion ###
Arguments against a `=` delimiter are:
* It lacks distinction from match, consider `#{ A = B = v }`,
does `A` match `B` or does `B` match `v`?
Which `=` is a match operation and which `=` delimits the key-value pair?
* It might be interpreted as `Key` 'equal' `Value`,
and hence `=` is in violation of rule #1 from *What do we expect from our syntax?*.
The interpretation of this syntax is ambiguous.
Arguments against a `~` delimiter are:
* it might be interpreted as `Key` 'NOT' `Value`
as `~` is the bitwise NOT operator in C,
* it might be interpreted as `Key` 'about equal' `Value`
as `~` is similar to mathematics `≃`,
* it lacks typographical distinction,
i.e. it lacks distinction from `-` in certain fonts,
ex. `K ~ V` might be interpreted as `K - V`, consider `#{ K - 1 ~ 2 - V }`,
and hence this is in violation of rule #1 from *What do we expect from our syntax?*.
The interpretation of this syntax is ambiguous.
Two two-character delimiter suggestions are `#{ Key := Value }` and
`#{ Key => Value}`, where `:=` is a common denominator for assignment and `=>`
should be read as *maps to*. A two-character delimiter should be avoided if at
all possible since it increases the syntax footprint of the source code.
The assignment delimiter reads well for just assignment but suffers from the
same reversed logic flaw as records when it comes to pattern matching. The match
`#{ key := Value } = M` reads *match M to the map pattern where Value is equal
to key*. That does not read well unless we call the assignment delimiter `:=`
for something that its not meant to be.
However, `:=` is also similar to `=:=` which means "is exactly equal",
i.e. matches. This is a valuable meaning since we have a difference
between `==` and `=:=` when dealing with numbers and thus `:=` could be a more
correct delimiter for matching syntax.
The delimiter `->` would be suitable choice if it weren't for the fact that it
would overload the function clause meaning.
Both `->` and `=>` might be confusing when dealing with binaries.
Consider `#{ <<"key">> -> <<"value">> }` and `#{ <<"key">> => <<"value">> }`,
where `=>` appears to be slightly more confusing than `->`.
Listing of delimiters from above perceived desirability:
1. `#{ K => V }` - No ambiguity, no overloading, reads as an association
2. `#{ K := V }` - No ambiguity, no overloading, reads as an assignment or exact match
3. `#{ K ~ V }` - Typographical ambiguity, no overloading, no clear meaning
4. `#{ K -> V }` - Overloads function clause head and body separator, reads as an association
5. `#{ K = V }` - Overloads match operator, reads as a match or an assignment
Using `:=` assignment for existing keys seems as a good choice. The choice for
set-or-update is between `=>` and `~`.
### The case for two set-or-update semantics and its syntax ###
A case for two different ways to update values in a Map is proposed.
One syntax if, and only if, we want to update a value for an already *existing*
key and another if we want to update the Map with any key.
* Use `M#{ K => V }` to declare new key value pairs *or* update already existing keys
* Use `M#{ K := V }` to update already existing keys.
* Use ` #{ K := V } = M` to match maps.
Example 1:
foo() ->
M = #{ key1 => 1, key2 => 2 }, % M is declared with keys 'key1' and 'key2'
bar(M).
bar(M) ->
M#{
key1 := "1", %% 'key1' will be set to "1"
key2 := "2", %% 'key2' will be set to "2"
key3 := "3" %% this causes an exception since 'key3' does not exist in M
}.
> foo().
** exception error: no match of 'key3' in map
Example 2:
foo() ->
M = #{ key1 => 1, key2 => 2 }, % M is declared with keys 'key1' and 'key2'
bar(M).
bar(M) ->
M#{
key1 => "1", %% 'key1' will be set to "1"
key2 => "2", %% 'key2' will be set to "2"
key3 => "3" %% 'key3' will be set to "3"
}.
> foo().
#{ key1 => 1, key2 => "2", key3 => "3" }
Impact of syntax footprint
--------------------------
We must lessen the syntax footprint impact on the source code and the language.
Currently the two normal ways of sending options to a functions are either via
records or property lists. Both have some drawbacks. Records are compile time
dependent and syntactic sugar for tuples. Property lists are generic but
produces a lot of texts when defining them and operating on them.
Consider this example when parsing a list of arguments:
args(Args) ->
args(Args, [{analyze, false}, {suppression, false}, {target, none}]).
args(["-r" | Args], Opts) ->
args(Args, [{analyze, true} | proplists:delete(analyze, Opts)]);
args(["-s="++File | Args], Opts) ->
args(Args, [{suppression, File} | proplists:delete(suppression, Opts)]);
args([Target], Opts) ->
[{target, Target} | proplists:delete(target, Opts)].
The textual impact, the number of characters, is quite heavy when operating on
property lists.
If we instead use some kind of map with syntax, how would that look?
args(Args) ->
args(Args, #{ analyze => false, suppression => false, target => none}).
args(["-r" | Args], Opts) -> args(Args, Opts#{ analyze := true });
args(["-s="++File | Args], Opts) -> args(Args, Opts#{ suppression := File});
args([Target], Opts) -> Opts#{ target := Target }.
This looks cleaner in my opinion but that is a very subjective view. To use some
data we can count the characters, and we see that the property lists example has
390 characters versus the map examples 306. Property lists uses almost 30% more
characters in this example.
Semantics and API-functions
---------------------------
### List conversions ###
Perhaps the most sane `maps:from_list/1` semantics would be to have the key-value
significance order in left to right, meaning the first association is used and
the latter values with equal keys are ignored.
This differs from the `dict:from_list/1` behavior.
Consider the following `dict` example:
[{a,2}] = dict:to_list(dict:from_list([{a,1}, {a,2}])).
By letting the leftmost be the most significant key we could simplify conversion
from and to lists.
Current suggestion has the following semantics:
Ls = [{a,old}],
#{ a := old } = maps:from_list([{a,new}|Ls]).
The reversal would be:
Ls = [{a,old}],
#{ a := new } = maps:from_list([{a,new}|Ls]).
Equality and Ordering
---------------------
A restriction set on the implementation by the Erlang specification is that
order is total, i.e. satisfies *antisymmetry*, *transitivity* and *totality*.
- If `M1 =< M2` and `M2 =< M1` then `M1 == M2`,
- If `M1 =< M2` and `M2 =< M3` then `M1 =< M3`,
- If `M1 =< M2` or `M2 =< M1` (always comparable)
where `M1`, `M2` and `M3` are any Map term.
This only holds true in Erlang if we treat floats and integers as union of types,
namely numbers. In the case of a Maps, `true = #{ 1.0 => V } == #{ 1 => V}`.
- The need for order arises in a few cases.
- comparison, for example sorting, `lists:sort([M1, .., Mn])`
- introspection, for example when printed.
- Ordered maps impose restrictions on the underlying implementation and a
hashing approach will be nearly impossible.
- The underlying structure does not need to be sorted, an order could be
produced when needed,
- `M1` < `M2`, would result in an internal sort but would cost
O( *N1* * lg *N1* + *N2* * lg *N2* ), where
`N1 = maps:size(M1) and N2 = maps:size(M2)`
The discussion and results on equality versus matching for keys stems from ets
and ordered set.
Consider the following with ets.
1> ets:new(foo, [named_table, ordered_set]).
foo
2> ets:insert(foo, {1,v1}).
true
3> ets:insert(foo, {1.0,v2}).
true
4> ets:lookup(foo,1).
[{1.0,v2}]
5> ets:match(foo, ets:fun2ms(fun({1,V}) -> V end)).
[]
Accessing a single value
------------------------
Do we need to have single access or is matching sufficient?
Consider the following,
V = M#{ K }
is shorter than
#{ K := V } = M
It also allows for easy access of associated values in deep structures.
The syntax for single value access is the least developed (and contemplated)
feature in this proposal and certainly could use some input.
More over, the dot syntax must be abolished. Currently it is used for records
but it will not be used for maps. Dot represents end of expression list in last
clause, or end of attribute.
It cannot be used to distinguish between floats or associations.
Example:
1> M = #{ 1.1 => a, 1 => #{ 1 => b } }.
#{ 1 => #{ 1 => b }, 1.1 => a }.
2> #M.1.1.
a | b ?
Backwards Compatibility
=======================
Erlang code written with Maps will only be parseable, loadable and executable
on Erlang/OTP R17A and later releases of Erlang/OTP but not on previous
releases.
Erlang code written before Erlang/OTP R17A will be perfectly compatible, i.e.
parseable, loadable and executable with these Maps changes.
Distribution will not be backwards compatible.
[perl-hashes]: http://perldoc.perl.org/perldata.html
"perldata - perldoc.perl.org"
[ruby-hashes]: http://ruby-doc.org/core-1.9.3/Hash.html
"Class: Hash (Ruby 1.9.3)"
[python-dict]: http://docs.python.org/tutorial/datastructures.html#dictionaries
"5. Data Structures - Python v2.7.3 documentation"
[scala-maps]: http://docs.scala-lang.org/overviews/collections/maps.html
"Collections - Maps - Scala Documentation"
[dict-module]: http://www.Erlang.org/doc/man/dict.html
"Erlang -- dict"
[ROKframe]: http://www.cs.otago.ac.nz/staffpriv/ok/frames.pdf
"No more need for records"
[erlspec]: http://www.Erlang.org/download/erl_spec47.ps.gz
"Erlang specification 4.7"
[json]: http://json.org/example.html
"JSON Example"
Copyright
=========
This document has been placed in the public domain.