Core Erlang Optimizations
This blog post continues the exploration of Core Erlang by
looking at some optimizations done by the sys_core_fold
compiler pass. The Core Erlang language was introduced in
the previous blog post.
To prepare the examples in this blog post I used two commands.
$ erlc +time +dcore core_fold_example.erl
Compiling "core_fold_example"
parse_module : 0.000 s 9.4 kB
transform_module : 0.000 s 9.4 kB
lint_module : 0.005 s 9.4 kB
expand_records : 0.000 s 9.4 kB
core : 0.000 s 59.3 kB
listing : 0.003 s 59.3 kB
The dcore
option produces the file core_fold_example.core
containing a listing of the Core Erlang code produced by the core
parse (implemented by the module v3_core
).
$ erlc +time +dcopt core_fold_example.erl
Compiling "core_fold_example"
parse_module : 0.000 s 9.4 kB
transform_module : 0.000 s 9.4 kB
lint_module : 0.002 s 9.4 kB
expand_records : 0.000 s 9.4 kB
core : 0.000 s 59.3 kB
sys_core_fold : 0.000 s 25.3 kB
core_transforms : 0.000 s 25.3 kB
listing : 0.002 s 25.3 kB
The dcopt
option produces the file core_fold_example.copt
containing a listing of the Core Erlang code as it looks
after optimization by the sys_core_fold
pass.
As was mentioned in my first blog post about the compiler,
compile:options()
will print most of the hidden options for
the compiler.
The most basic optimization #
The most basic optimization done by sys_core_fold
is constant propagation.
Consider this Erlang function:
a() ->
A = 42,
{ok,A}.
It can be translated to Core Erlang like this:
'a'/0 =
fun () ->
let <A> = 42
in {'ok',A}
The variable A
is bound to a constant (as opposed to an expression such
as function call). We can replace all occurrences of the variable A
with
the constant value 42
and eliminate the let
:
'a'/0 =
fun () ->
{'ok',42}
Optimizing case
expressions #
Actually, the first version of a/0
that I showed was already
slightly optimized by me.
Here is the actual Core Erlang code (only slightly edited to remove annotations and unnecessary line breaks):
'a'/0 =
fun () ->
case <> of
<> when 'true' ->
let <A> = 42
in {'ok',A}
<> when 'true' ->
primop 'match_fail'({'function_clause'})
end
The let
has been wrapped in a useless outer case
. The
case
would serve some purpose if there had been some function
arguments, but why complicate the code generator if sys_core_fold
is
perfectly capable of simplifying this code?
sys_core_fold
will simplify the code in several steps.
First it will look at each clause. If a clause can’t possibly
be executed (for example, it its guard is false
) it will be
dropped. If a clause will always match, all clauses following
the clause will be dropped.
In this case, the first clause will always match, because the
pattern is a list of no variables that can’t fail to match, and
the guard is true
. Thus the second clause is unreachable and
is dropped:
'a'/0 =
fun () ->
case <> of
<> when 'true' ->
let <A> = 42
in {'ok',A}
end
The next step is to see if there is only one clause remaining.
If it is, the body of the clause can be kept and the case
eliminated:
'a'/0 =
fun () ->
let <A> = 42
in {'ok',A}
Another case example #
Let’s see how a more complicated function can be optimized following the steps just described. Consider this Erlang function:
aa() ->
case {a,tuple} of
[List] -> List;
{A,B} -> {tuple,A,B};
_ -> something_else
end.
Translated to Core Erlang code (with the outer case
and
annotations removed) it will look this:
'aa'/0 =
fun () ->
case {'a','tuple'} of
<[List|[]]> when 'true' ->
List
<{A,B}> when 'true' ->
{'tuple',A,B}
<_@c1> when 'true' ->
'something_else'
<_@c0> when 'true' ->
primop 'match_fail'({'case_clause',_@c0})
end
Let’s go through the clauses one by one:
-
The first clause will only match a list with exactly one element. The
case
expression is a tuple, so the first clause can’t possibly match. It will be dropped. -
The second clause will match a tuple with (any) two elements. The case expression is a tuple with two elements, so this clause will always match.
-
There is no need to look at the remaining clauses, since the second clause will always match. The remaining clauses are dropped.
We now have:
'aa'/0 =
fun () ->
case {'a','tuple'} of
<{A,B}> when 'true' ->
{'tuple',A,B}
end
This is a case
with just one clause, so we can keep
the body of the clause and remove the case
. But there is
a problem if we do that naively:
'aa'/0 =
fun () ->
{'tuple',A,B}
The variables A
and B
are used, but they don’t have
any values bound to them. We must use a let
to bind
the variables before they can be used:
'aa'/0 =
fun () ->
let <A,B> = <'a','tuple'>
in {'tuple',A,B}
Propagating constants, the final code is:
'aa'/0 =
fun () ->
{'tuple','a','tuple'}
Avoiding tuple building #
Here is an example of a common pattern of matching several expressions in parallel:
b(A, B) ->
case {A,B} of
{true,false} -> ok;
{false,true} -> not_ok;
{_,_} -> error
end.
The unoptimized Core Erlang code looks like this:
'b'/2 =
fun (_@c1,_@c0) ->
case <_@c1,_@c0> of
<A,B> when 'true' ->
case {A,B} of
<{'true','false'}> when 'true' ->
'ok'
<{'false','true'}> when 'true' ->
'not_ok'
<{_@c5,_@c6}> when 'true' ->
'error'
<_@c2> when 'true' ->
primop 'match_fail'({'case_clause',_@c2})
end
end
The case
expression is {A,B}
. When executing the case
a tuple will built, and then almost immediately discarded.
That is wasteful. Therefore sys_core_fold
rewrites the
code to eliminate the tuple building:
'b'/2 =
fun (_@c1,_@c0) ->
case <_@c1,_@c0> of
<'true','false'> when 'true' ->
'ok'
<'false','true'> when 'true' ->
'not_ok'
<_@c5,_@c6> when 'true' ->
'error'
end
Here a value list is used instead of a tuple. (See previous blog post for several examples of value lists.)
Another common pattern where tuples are built and immediately discarded is shown in this example:
c(X) ->
{A,B} = case X of
a1 -> {10,1};
b2 -> {20,2};
_ -> {100,42}
end,
A+B.
The unoptimized Core Erlang code looks like this:
'c'/1 =
fun (_@c0) ->
case _@c0 of
<X> when 'true' ->
let <_@c2> =
case X of
<'a1'> when 'true' ->
{10,1}
<'b2'> when 'true' ->
{20,2}
<_@c5> when 'true' ->
{100,42}
<_@c1> when 'true' ->
primop 'match_fail'({'case_clause',_@c1})
end
in
case _@c2 of
<{A,B}> when 'true' ->
call 'erlang':'+'(A, B)
<_@c3> when 'true' ->
primop 'match_fail'({'badmatch',_@c3})
end
<_@c4> when 'true' ->
primop 'match_fail'({'function_clause',_@c4})
end
Here a tuple is built and assigned to _@c2
. It is then matched
in a case
.
First the code is optimized like this to eliminate the tuple building
in each clause of the first case
:
'c'/1 =
fun (_@c0) ->
let <_@f4,_@f5> =
case _@c0 of
<'a1'> when 'true' ->
<10,1>
<'b2'> when 'true' ->
<20,2>
<_@c5> when 'true' ->
<100,42>
end
in
let <_@c2> = {_@f4,_@f5}
in
case _@c2 of
<{A,B}> when 'true' ->
call 'erlang':'+'(A, B)
<_@c3> when 'true' ->
primop 'match_fail'({'badmatch',_@c3})
end
end
Applying all of the optimizations previously described, the remaining tuple building and matching can be eliminated:
'c'/1 =
fun (_@c0) ->
let <_@f4,_@f5> =
case _@c0 of
<'a1'> when 'true' ->
<10,1>
<'b2'> when 'true' ->
<20,2>
<_@c5> when 'true' ->
<100,42>
end
in
call 'erlang':'+'(_@f4, _@f5)
Conclusion #
That was a quick look at some of the optimizations done by
sys_core_fold
.
Some of the optimizations are very simple. The power of the
sys_core_fold
pass comes from the combination of optimizations. One
optimization gives opportunities for other optimizations, as could be
seen in the examples.
Points to Ponder #
Why is the optimization pass called sys_core_fold
?
A hint can be found in the title of this Wikipedia article: Constant folding.