This EEP proposes an extension to matching of binaries to allow the size of a segment to be a guard expression and to the matching of maps to allow the key to be a guard expression.
We propose that in binary matching the size of a binary segment can be a guard expression. Here is an example:
example1(<<Size:8,Payload:((Size-1)*8)/binary,Rest/binary>>) ->
{Payload,Rest}.
The same expressions as in guards are allowed, except that the
old-style type tests (such a list/1
or tuple/1
) are not
allowed. Unless the expression consists of a single number or single
variable is must be enclosed in parentheses. Any variables used in the
expression must have been previously bound, or become bound in the
same binary pattern as the expression. That is, the following example
is illegal:
illegal_example2(N, <<X:N,T/binary>>) ->
{X,T}.
A binary pattern will fail to match, if size expression in any of its segments does not evaluate successfully or evaluates to a non-integer value. For example:
example3(<<X:(1/0)>>) -> X;
example3(<<X:not_integer>>) -> X;
example3(_) -> no_match.
The first clause will not match because the evaluation of 1/0
fails.
The second clause will not match because the size evaluates to an atom.
In the current map matching syntax, the keys in a map pattern must be a single value or a literal. That leads to unnatural code if the keys in a map are complex terms. For example:
example4(M, X) ->
Key = {tag,X},
#{Key := Value} = M,
Value.
We propose that the key in a map pattern can be a guard expression. That will allow the previous example to be written like this:
example5(M, X) ->
#{{tag,X} := Value} = M,
Value.
All variables used in a key expression must be previously bound. Thus, the following example is illegal:
illegal_example6(Key, #{Key := Value}) -> Value.
The current limitations of map keys are surprising. A literal tuple
such as {a,b}
is allowed as a key, while a tuple with a variable
such as {a,Var}
is not.
In binary matching, it has always been possible to multiply a matched
out number by a small constant using the unit:
modifier. The proposed
extension makes is possible in more circumstances to match both header
and payload in the same binary pattern.
We did consider only allowing term construction and expressions using arithmetic operators. There are two reasons we went with guard expression instead:
It is easy to explain and understand exactly what expression are allowed as segment sizes and as map keys, because the same type of expressions are allowed in guards.
When calculating the size of binary segment, a subset of guard BIFs
could conceivable be useful in practice. For example: ceiling/1
,
round/1
, byte_size/1
, bit_size/1
, and map_get/2
. We did not
want an arbitrary list of allowed BIFs in size expressions, so the
only logical thing is to allow full guard expressions.
Size expressions that obviously never evaluates to an integer will not cause a compilation error (but may cause a warning). For example:
example6(Bin, V) ->
<<X:(is_list(V))>> = Bin,
X.
The reason is that rules for what is a legal Erlang program should be simple and unambiguous, to help both people and tools that generate Erlang programs.
For the same reason that they are required when constructing binaries,
namely that language grammar would be ambiguous without the
parentheses, since binary patterns use the characters :
, /
, and
-
with a different meaning than in the rest of the language.
Using the extended expressions segment size and map keys would cause a compilation error in OTP 22 and previous releases. Therefore, no existing source code can be affected.
However, there are changes to the semantics of Core Erlang that may make it necessary to update languages compilers or tools that generate Core Erlang code.
There are two major changes:
A binary pattern in Core Erlang is no longer allowed to bind and use a variable in the same binary pattern.
To fully support binary matching in receives, receives must be lowered to more primitive operations.
In Erlang, a variable can be bound in a binary pattern and used later in the same pattern as the size of a segment:
foo(<<Sz:16,X:Sz>>) -> X.
In OTP 22 and previous releases, the translation to Core Erlang is straightforward:
'foo'/1 =
fun (_0) ->
case _0 of
<#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
#<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
X
<_1> when 'true' ->
%% Raise function_clause exception.
.
.
.
end
While the translation is straightforward, all Core Erlang passes would need to handle binding and using a variable in the same scope. That would become even more complicated if we were to allow expressions as segment sizes.
In OTP 23, all variables used in a segment size expression must be already bound in the enclosing environment. The previous example must be rewritten like this using nested cases:
'foo'/1 =
fun (_0) ->
case _0 of
<#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
#<_2>('all',1,'binary',['unsigned'|['big']])}#> when 'true' ->
case _2 of
<#{#<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
X
<_3> when 'true' ->
%% Raise function_clause exception.
.
.
.
end
<_4> when 'true' ->
%% Raise function_clause exception.
.
.
.
end
However, as can be seen from the example, the code for raising the function_clause
exception has been duplicated. The code duplication is no big deal in this simple
example, but it would be in a function where the binary matching clause was followed
by many other clauses. To avoid the code duplication, we must use letrec
with
the letrec_goto
annotation:
'foo'/1 =
fun (_0) ->
( letrec
'label^0'/0 =
fun () ->
case _0 of
<_1> when 'true' ->
%% Raise function_clause exception.
.
.
.
end
in case _0 of
<#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
#<_2>('all',1,'binary',['unsigned'|['big']])}#> when 'true' ->
case _2 of
<#{#<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
X
<_3> when 'true' ->
apply 'label^0'/0()
end
<_4> when 'true' ->
apply 'label^0'/0()
end
-| ['letrec_goto'] )
When a letrec
has been given the annotation letrec_goto
, it will be
specially translated. The apply
operations will be translated to a goto
instead of a call to a local function.
Consider this example:
bar(Timeout) ->
receive
{tag,Msg} -> Msg
after
Timeout ->
no_message
end.
In a OTP 22 and earlier, the translation to Core Erlang was straightforward:
'bar'/1 =
fun (Timeout) ->
receive
<{'tag',Msg}> when 'true' ->
Msg
after Timeout ->
'no_message'
In order to fully support binary matching in OTP 23, a receive
in Erlang
has now been lowered to more primitive operations in Core Erlang:
'foo'/1 =
fun (Timeout) ->
( letrec
'recv$^0'/0 =
fun () ->
let <PeekSucceeded,Message> =
primop 'recv_peek_message'()
in case PeekSucceeded of
<'true'> when 'true' ->
case Message of
<{'tag',Msg}> when 'true' ->
do primop 'remove_message'()
Msg
<Other> when 'true' ->
do primop 'recv_next'()
apply 'recv$^0'/0()
end
<'false'> when 'true' ->
let <TimedOut> =
primop 'recv_wait_timeout'(Timeout)
in case TimedOut of
<'true'> when 'true' ->
do primop 'timeout'()
'no_message'
<'false'> when 'true' ->
apply 'recv$^0'/0()
end
end
in apply 'recv$^0'/0()
-| ['letrec_goto'] )
When compiling from Core Erlang code in OTP 23, the compiler will
accept Core Erlang code that uses the receive
construct and
automatically lower it to the more primitive operations. That is,
for the example above, the Core Erlang translation from OTP 22 will
be accepted as input to the compiler in OTP 23.
Here is another example where the Core Erlang code from OTP 22 will not be accepted. Here is the Erlang code:
foobar() ->
receive
<<Sz:16,X:Sz>> -> X
end.
In OTP 22, this would be translated to Core Erlang code like this:
'foobar'/0 =
fun () ->
receive
<#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
#<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
X
after 'infinity' ->
'true'
That translation will not be accepted by the compiler in OTP 23. The receive
must be lowered to more primitive operations, and the binary matching must be
rewritten using nested cases:
'foobar'/0 =
fun () ->
( letrec
'recv$^0'/0 =
fun () ->
let <_5,_0> =
primop 'recv_peek_message'()
in case _5 of
<'true'> when 'true' ->
( letrec
'label^0'/0 =
fun () ->
do primop 'recv_next'()
apply 'recv$^0'/0()
in case _0 of
<#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
#<_1>('all',1,'binary',['unsigned'|['big']])}#> when 'true' ->
case _1 of
<#{#<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
do primop 'remove_message'()
X
<_2> when 'true' ->
apply 'label^0'/0()
end
<_3> when 'true' ->
apply 'label^0'/0()
end
-| ['letrec_goto'] )
<'false'> when 'true' ->
let <_4> =
primop 'recv_wait_timeout'
('infinity')
in case _4 of
<'true'> when 'true' ->
do primop 'timeout'()
'true'
<'false'> when 'true' ->
apply 'recv$^0'/0()
end
end
in apply 'recv$^0'/0() )
-| ['letrec_goto']
The implementation can be found in PR #2521.
This document has been placed in the public domain.