This EEP proposes the addition of zip generators with a syntax of &&
for
comprehensions in Erlang. The idea and syntax of zip generators (comprehension
multigenerators) was first brought up by EEP-19. Even if the syntax and
usages of zip generators proposed by this EEP is mostly the same with EEP-19,
the comprehension language of Erlang has undergone many changes since EEP-19
was accepted. With an implementation that is compatible with all existing
comprehensions, this EEP defines the behavior of zip generators with more
clarification on the compiler’s part.
List comprehension is a way to create a new list from existing list(s). Lists are traversed in a dependent (nested) way. In the following example, the resulting list has length 4 when the two input lists both have length 2.
[{X,Y} || X <- [1,2], Y <- [3,4]] = [{1,3}, {1,4}, {2,3}, {2,4}].
In contrast, parallel list comprehension (also known as zip comprehension)
evaluates qualifiers (a generalization of lists) in parallel. Qualifiers are
first “zipped” together, and then evaluated. Many functional languages
(Haskell, Racket, etc.) and non-functional languages (Python etc.)
support this variation. Suppose the two lists in the example above are
evaluated as a zip comprehension, the result would be [{1,3}, {2,4}]
.
Zip comprehensions allow the user to conveniently iterate over several lists
at once. Without it, the standard way to accomplish the same task in Erlang
is to use lists:zip
to zip two lists into two-tuples, or to use lists:zip3
to zip three lists into three-tuples. The list module does not provide a
function to zip more than three lists. Functions like lists:zip
always
create intermediate data structures when compiled. The compiler does not
perform deforestation to eliminate the unwanted tuples.
Zip generators is a generalization of zip comprehensions. Every set of zipped generators is treated as one generator. Instead of constraining a comprehension to be either zipped or non-zipped, any generator can be either a zip generator (containing at least two generators zipped together), or a non-zip generator (all existing generators are non-zip generator). Therefore, zip generators can be mixed freely with all existing generators and filters. Zip comprehension then becomes a special case of comprehension where only zip generators are used.
Within the OTP codebase, there are many uses of lists:zip
within comprehensions.
All of them can be simplified by zip generators using &&
syntax. For example,
The yecc.erl
in parsetools contains the following comprehension (external
function calls and irrelevant fields redacted for readability):
PartDataL = [#part_data{name = Nm, eq_state = Eqs, actions = P, states = S}
|| {{Nm,P}, {Nm,S}, {Nm,EqS}} <-
lists:zip3(PartNameL, PartInStates, PartStates)].
When using zip generators, the comprehension is rewritten to:
PartDataL = [#part_data{name = Nm, eq_state = Eqs, actions = P, states = S}
|| {Nm,P} <- PartNameL && {Nm,S} <- PartInStates && {Nm,EqS} <- PartStates].
By using zip generators, the compiler avoids the need to build the intermediate
list of tuples. Variable bindings and pattern matching within a zip generator
works as expected, as Nm
is supposed to bind to the same value in {Nm,P}
and {Nm,S}
. If the binding fails, then one element from each of the 3
generators is skipped. (If a strict generator is used, then the comprehension
fails with an exception, as specified in Error Behaviors.)
In summary, zip generators remove the user’s need to call the zip function within comprehensions and allows for any number of lists to be zipped at once. It can be used in list, binary, and map comprehensions, and mixed freely with all existing generators and filters. Internally, the compiler does not create any intermediate data structure, therefore also removing the need of deforestation.
Currently, Erlang supports three kinds of comprehensions, list comprehension, binary comprehension, and map comprehension. Their names refer to the result of the comprehension. List comprehension produces a list; binary comprehension produces a binary, etc.
[Expression || Qualifier(s)] %% List Comprehension
<<Expression || Qualifier(s)>> %% Binary Comprehension
#{Expression || Qualifier(s)} %% Map Comprehension
Qualifiers can have the following kind: filter, list generator, bitstring
generator, and map generator. Except for filters, the other three kinds of
qualifiers are generators. Their names refer to the type on the right hand
side of <-
or <=
. Generators have the following form:
Pattern <- List %% List Generator
Pattern <= Bitstring %% Bitstring Generator
Pattern_1 := Pattern_2 <- Map %% Map Generator
All qualifiers and filters can be freely used and mixed in all 3 kinds of comprehensions. The following example shows a list comprehension with a list generator and a bitstring generator.
[{X,Y} || X <- [1,2,3], <<Y>> <= <<4,5,6>>].
This EEP proposes the addition of zip generators. A zip generator is two or
more generators connected by &&
. Zip generators is constructed to connect
any number of the 3 kinds of generators above. Zip generators can be used
in list, binary, or map comprehensions in the same way.
For example, if the two generators in the above example is combined together as a zip generator, the comprehension would look like:
[{X,Y} || X <- [1,2,3] && <<Y>> <= <<4,5,6>>].
For every zip generator of the form
G1 && ... && Gn
, it is evaluated to have the same result as zip/n
where
zip([H1|T1], ..., [Hn|Tn]) ->
[{H1,...,Hn} | zip(T1, ..., Tn)];
zip([], ..., []) ->
[].
Therefore, the above comprehension evaluates to [{1,4}, {2,5}, {3,6}]
, which
is the same as if using lists:zip/2
.
Zip generator can also be used when a comprehension contains other non-zip
generators and/or filters. The &&
symbol has a higher precedence than ,
.
The following example evaluates to [{b,4}, {c,6}]
. The element {a,2}
is
filtered out from the resulting list.
[{X, Y} || X <- [a, b, c] && <<Y>> <= <<2, 4, 6>>, Y =/= 2].
The skipping behavior of individual generators within a zip generator is always respected. When strict generators are present in a zip generator, during every round of evaluation, all strict generators will be evaluated even if another relaxed generator already causes the result to be skipped.
For example, the following comprehension has a zip generator containing a strict generator and a relaxed generator.
[{X, Y} || {a, X} <:- L1 && {b, Y} <- L2]
It will badarg
if pattern matching for {a, X}
fails, for example, when
L1 = [{a, 1}, {bad, 2}, {a, 3}]
. It will simply skip 1 item from L1
and
1 item from L2
if pattern matching for {b, Y}
fails, for example, when
L2 = [{b, 1}, {bad, 2}, {b, 3}]
. The attempt of matching {a, X}
will
be made every round, regardless of generators ordering and whether other
pattern matchings succeed.
Comparing to using helper functions, there is one advantage of using a zip generator: The Erlang compiler does not generate any tuple when a zip generator is translated into core Erlang. The generated code reflects the programmer’s intent, which is to collect one element from every list at a time without creating a list of tuples.
One would expect that when errors happen, a zip generator behaves the same
as lists:zip/2
, lists:zip3/3
, and also the zip/n
function above when
more than 3 lists are zipped together. The design and implementation of
zip generators aim to achieve that both for compiled code and for comprehensions
evaluated in Erlang shell.
lists:zip/2
and lists:zip3/3
will fail if the given lists are not of the
same length, where zip/n
will also crash. Therefore, a zip generator raises a
bad generators
error when it discovers that the given generators are of
different lengths.
When a zip generator crashes because the containing generators are of
different lengths, the internal error message is a tuple, where the first
element is the atom bad_generators
, and the second element is a tuple that
contains the remaining data from all generators. The user-facing error message
is bad generators:
, followed by the tuple containing remaining data from
all generators.
For example, this comprehension will crash at runtime.
[{X,Y} || X <- [1,2,3] && Y <- [1,2,3,4]].
The resulting error tuple is {bad_generators,{[],[4]}}
. This is because
when the comprehension crashes, the first list in the zip generator has
only the empty list []
left, while the second list in the zip generator
has [4]
left.
On the compiler’s side, it is rather difficult to return the original zip generator in the error message, or to point out which generator is of different length comparing to others. The proposed error message aims to gives the most helpful information without imposing extra burden on the compiler or runtime.
When a zip generator crashes because at least one strict generators contained
in it fails, the resulting error tuple is of the same format as when generators
are of different lengths. Its first element is the atom bad_generators
, and
the second element is a tuple containing remaining data from all generators.
For example, this comprehension will crash at runtime, because bad
cannot
match the pattern {ok,A}
.
[A + B || {ok,A} <:- [bad, {ok,1}] && B <- [2,3]].
The resulting error tuple is {bad_generators,{[bad, {ok,1}],[2,3]}}
. Although
strict generators alone fail with exception badmatch
, as specified in
EEP-70, it is not plausible to use the same exception in zip generators,
due to difficulty in distinguishing between badmatch
and bad_generators
errors.
In the following example, the comprehension will crash at runtime, either for the failed strict generator, or for two generators of different lengths.
[A + B || {ok,A} <:- [bad] && B <- []].
The emitted error message is {bad_generators,{[bad],[]}}
. We do not
distinguish between the two errors, and instead always output all remaining
data in the generators. The user can examine the remaining data and see that
the first generator fails matching, and the second generator is empty.
As the idea of zipping only makes sense for generators, a zip generator cannot contain filters or any expression that is not a generator. Whenever it is possible to catch such an error at compile-time, this error is caught by the Erlang linter.
For example, the zip generator in the following comprehension contains a filter.
zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0].
When the function is compiled, the linter points out that only generators are allowed in a zip generator, together with the position of the non-generator.
t.erl:6:55: only generators are allowed in a zip generator.
% 6| zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0].
% | ^
The operator &&
is not used in Erlang. No existing code is affected by
this addition.
PR-8926 contains the implementation for zip generators.
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.