4 Constructing and Matching Binaries
In R12B, the most natural way to construct and match binaries is significantly faster than in earlier releases.
To construct a binary, you can simply write as follows:
DO (in R12B) / REALLY DO NOT (in earlier releases)
my_list_to_binary(List) ->
    my_list_to_binary(List, <<>>).
my_list_to_binary([H|T], Acc) ->
    my_list_to_binary(T, <<Acc/binary,H>>);
my_list_to_binary([], Acc) ->
    Acc.In releases before R12B, Acc is copied in every iteration. In R12B, Acc is copied only in the first iteration and extra space is allocated at the end of the copied binary. In the next iteration, H is written into the extra space. When the extra space runs out, the binary is reallocated with more extra space. The extra space allocated (or reallocated) is twice the size of the existing binary data, or 256, whichever is larger.
The most natural way to match binaries is now the fastest:
DO (in R12B)
my_binary_to_list(<<H,T/binary>>) ->
    [H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].4.1 How Binaries are Implemented
Internally, binaries and bitstrings are implemented in the same way. In this section, they are called binaries because that is what they are called in the emulator source code.
Four types of binary objects are available internally:
- 
Two are containers for binary data and are called: - Refc binaries (short for reference-counted binaries)
- Heap binaries
 
- 
Two are merely references to a part of a binary and are called: - sub binaries
- match contexts
 
Refc Binaries
Refc binaries consist of two parts:
- An object stored on the process heap, called a ProcBin
- The binary object itself, stored outside all process heaps
The binary object can be referenced by any number of ProcBins from any number of processes. The object contains a reference counter to keep track of the number of references, so that it can be removed when the last reference disappears.
All ProcBin objects in a process are part of a linked list, so that the garbage collector can keep track of them and decrement the reference counters in the binary when a ProcBin disappears.
Heap Binaries
Heap binaries are small binaries, up to 64 bytes, and are stored directly on the process heap. They are copied when the process is garbage-collected and when they are sent as a message. They do not require any special handling by the garbage collector.
Sub Binaries
The reference objects sub binaries and match contexts can reference part of a refc binary or heap binary.
A sub binary is created by split_binary/2 and when a binary is matched out in a binary pattern. A sub binary is a reference into a part of another binary (refc or heap binary, but never into another sub binary). Therefore, matching out a binary is relatively cheap because the actual binary data is never copied.
Match Context
A match context is similar to a sub binary, but is optimized for binary matching. For example, it contains a direct pointer to the binary data. For each field that is matched out of a binary, the position in the match context is incremented.
In R11B, a match context was only used during a binary matching operation.
In R12B, the compiler tries to avoid generating code that creates a sub binary, only to shortly afterwards create a new match context and discard the sub binary. Instead of creating a sub binary, the match context is kept.
The compiler can only do this optimization if it knows that the match context will not be shared. If it would be shared, the functional properties (also called referential transparency) of Erlang would break.
4.2 Constructing Binaries
In R12B, appending to a binary or bitstring is specially optimized by the runtime system:
<<Binary/binary, ...>> <<Binary/bitstring, ...>>
As the runtime system handles the optimization (instead of the compiler), there are very few circumstances in which the optimization does not work.
To explain how it works, let us examine the following code line by line:
Bin0 = <<0>>,                    %% 1
Bin1 = <<Bin0/binary,1,2,3>>,    %% 2
Bin2 = <<Bin1/binary,4,5,6>>,    %% 3
Bin3 = <<Bin2/binary,7,8,9>>,    %% 4
Bin4 = <<Bin1/binary,17>>,       %% 5 !!!
{Bin4,Bin3}                      %% 6- Line 1 (marked with the %% 1 comment), assigns a heap binary to the Bin0 variable.
- Line 2 is an append operation. As Bin0 has not been involved in an append operation, a new refc binary is created and the contents of Bin0 is copied into it. The ProcBin part of the refc binary has its size set to the size of the data stored in the binary, while the binary object has extra space allocated. The size of the binary object is either twice the size of Bin1 or 256, whichever is larger. In this case it is 256.
- Line 3 is more interesting. Bin1 has been used in an append operation, and it has 252 bytes of unused storage at the end, so the 3 new bytes are stored there.
- Line 4. The same applies here. There are 249 bytes left, so there is no problem storing another 3 bytes.
- Line 5. Here, something interesting happens. Notice that the result is not appended to the previous result in Bin3, but to Bin1. It is expected that Bin4 will be assigned the value <<0,1,2,3,17>>. It is also expected that Bin3 will retain its value (<<0,1,2,3,4,5,6,7,8,9>>). Clearly, the runtime system cannot write byte 17 into the binary, because that would change the value of Bin3 to <<0,1,2,3,4,17,6,7,8,9>>.
The runtime system sees that Bin1 is the result from a previous append operation (not from the latest append operation), so it copies the contents of Bin1 to a new binary, reserve extra storage, and so on. (Here is not explained how the runtime system can know that it is not allowed to write into Bin1; it is left as an exercise to the curious reader to figure out how it is done by reading the emulator sources, primarily erl_bits.c.)
Circumstances That Force Copying
The optimization of the binary append operation requires that there is a single ProcBin and a single reference to the ProcBin for the binary. The reason is that the binary object can be moved (reallocated) during an append operation, and when that happens, the pointer in the ProcBin must be updated. If there would be more than one ProcBin pointing to the binary object, it would not be possible to find and update all of them.
Therefore, certain operations on a binary mark it so that any future append operation will be forced to copy the binary. In most cases, the binary object will be shrunk at the same time to reclaim the extra space allocated for growing.
When appending to a binary as follows, only the binary returned from the latest append operation will support further cheap append operations:
Bin = <<Bin0,...>>
In the code fragment in the beginning of this section, appending to Bin will be cheap, while appending to Bin0 will force the creation of a new binary and copying of the contents of Bin0.
If a binary is sent as a message to a process or port, the binary will be shrunk and any further append operation will copy the binary data into a new binary. For example, in the following code fragment Bin1 will be copied in the third line:
Bin1 = <<Bin0,...>>, PortOrPid ! Bin1, Bin = <<Bin1,...>> %% Bin1 will be COPIED
The same happens if you insert a binary into an Ets table, send it to a port using erlang:port_command/2, or pass it to enif_inspect_binary in a NIF.
Matching a binary will also cause it to shrink and the next append operation will copy the binary data:
Bin1 = <<Bin0,...>>, <<X,Y,Z,T/binary>> = Bin1, Bin = <<Bin1,...>> %% Bin1 will be COPIED
The reason is that a match context contains a direct pointer to the binary data.
If a process simply keeps binaries (either in "loop data" or in the process dictionary), the garbage collector can eventually shrink the binaries. If only one such binary is kept, it will not be shrunk. If the process later appends to a binary that has been shrunk, the binary object will be reallocated to make place for the data to be appended.
4.3 Matching Binaries
Let us revisit the example in the beginning of the previous section:
DO (in R12B)
my_binary_to_list(<<H,T/binary>>) ->
    [H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].The first time my_binary_to_list/1 is called, a match context is created. The match context points to the first byte of the binary. 1 byte is matched out and the match context is updated to point to the second byte in the binary.
In R11B, at this point a sub binary would be created. In R12B, the compiler sees that there is no point in creating a sub binary, because there will soon be a call to a function (in this case, to my_binary_to_list/1 itself) that immediately will create a new match context and discard the sub binary.
Therefore, in R12B, my_binary_to_list/1 calls itself with the match context instead of with a sub binary. The instruction that initializes the matching operation basically does nothing when it sees that it was passed a match context instead of a binary.
When the end of the binary is reached and the second clause matches, the match context will simply be discarded (removed in the next garbage collection, as there is no longer any reference to it).
To summarize, my_binary_to_list/1 in R12B only needs to create one match context and no sub binaries. In R11B, if the binary contains N bytes, N+1 match contexts and N sub binaries are created.
In R11B, the fastest way to match binaries is as follows:
DO NOT (in R12B)
my_complicated_binary_to_list(Bin) ->
    my_complicated_binary_to_list(Bin, 0).
my_complicated_binary_to_list(Bin, Skip) ->
    case Bin of
	<<_:Skip/binary,Byte,_/binary>> ->
	    [Byte|my_complicated_binary_to_list(Bin, Skip+1)];
	<<_:Skip/binary>> ->
	    []
    end.This function cleverly avoids building sub binaries, but it cannot avoid building a match context in each recursion step. Therefore, in both R11B and R12B, my_complicated_binary_to_list/1 builds N+1 match contexts. (In a future Erlang/OTP release, the compiler might be able to generate code that reuses the match context.)
Returning to my_binary_to_list/1, notice that the match context was discarded when the entire binary had been traversed. What happens if the iteration stops before it has reached the end of the binary? Will the optimization still work?
after_zero(<<0,T/binary>>) ->
    T;
after_zero(<<_,T/binary>>) ->
    after_zero(T);
after_zero(<<>>) ->
    <<>>.
  Yes, it will. The compiler will remove the building of the sub binary in the second clause:
...
after_zero(<<_,T/binary>>) ->
    after_zero(T);
...But it will generate code that builds a sub binary in the first clause:
after_zero(<<0,T/binary>>) ->
    T;
...Therefore, after_zero/1 builds one match context and one sub binary (assuming it is passed a binary that contains a zero byte).
Code like the following will also be optimized:
all_but_zeroes_to_list(Buffer, Acc, 0) ->
    {lists:reverse(Acc),Buffer};
all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) ->
    all_but_zeroes_to_list(T, Acc, Remaining-1);
all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) ->
    all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).The compiler removes building of sub binaries in the second and third clauses, and it adds an instruction to the first clause that converts Buffer from a match context to a sub binary (or do nothing if Buffer is a binary already).
Before you begin to think that the compiler can optimize any binary patterns, the following function cannot be optimized by the compiler (currently, at least):
non_opt_eq([H|T1], <<H,T2/binary>>) ->
    non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
    false;
non_opt_eq([], <<>>) ->
    true.It was mentioned earlier that the compiler can only delay creation of sub binaries if it knows that the binary will not be shared. In this case, the compiler cannot know.
Soon it is shown how to rewrite non_opt_eq/2 so that the delayed sub binary optimization can be applied, and more importantly, it is shown how you can find out whether your code can be optimized.
Option bin_opt_info
Use the bin_opt_info option to have the compiler print a lot of information about binary optimizations. It can be given either to the compiler or erlc:
erlc +bin_opt_info Mod.erl
or passed through an environment variable:
export ERL_COMPILER_OPTIONS=bin_opt_info
Notice that the bin_opt_info is not meant to be a permanent option added to your Makefiles, because all messages that it generates cannot be eliminated. Therefore, passing the option through the environment is in most cases the most practical approach.
The warnings look as follows:
./efficiency_guide.erl:60: Warning: NOT OPTIMIZED: sub binary is used or returned ./efficiency_guide.erl:62: Warning: OPTIMIZED: creation of sub binary delayed
To make it clearer exactly what code the warnings refer to, the warnings in the following examples are inserted as comments after the clause they refer to, for example:
after_zero(<<0,T/binary>>) ->
         %% NOT OPTIMIZED: sub binary is used or returned
    T;
after_zero(<<_,T/binary>>) ->
         %% OPTIMIZED: creation of sub binary delayed
    after_zero(T);
after_zero(<<>>) ->
    <<>>.The warning for the first clause says that the creation of a sub binary cannot be delayed, because it will be returned. The warning for the second clause says that a sub binary will not be created (yet).
Let us revisit the earlier example of the code that could not be optimized and find out why:
non_opt_eq([H|T1], <<H,T2/binary>>) ->
        %% INFO: matching anything else but a plain variable to
	%%    the left of binary pattern will prevent delayed 
	%%    sub binary optimization;
	%%    SUGGEST changing argument order
        %% NOT OPTIMIZED: called function non_opt_eq/2 does not
	%%    begin with a suitable binary matching instruction
    non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
    false;
non_opt_eq([], <<>>) ->
    true.The compiler emitted two warnings. The INFO warning refers to the function non_opt_eq/2 as a callee, indicating that any function that call non_opt_eq/2 cannot make delayed sub binary optimization. There is also a suggestion to change argument order. The second warning (that happens to refer to the same line) refers to the construction of the sub binary itself.
Soon another example will show the difference between the INFO and NOT OPTIMIZED warnings somewhat clearer, but let us first follow the suggestion to change argument order:
opt_eq(<<H,T1/binary>>, [H|T2]) ->
        %% OPTIMIZED: creation of sub binary delayed
    opt_eq(T1, T2);
opt_eq(<<_,_/binary>>, [_|_]) ->
    false;
opt_eq(<<>>, []) ->
    true.The compiler gives a warning for the following code fragment:
match_body([0|_], <<H,_/binary>>) ->
        %% INFO: matching anything else but a plain variable to
	%%    the left of binary pattern will prevent delayed 
	%%    sub binary optimization;
	%%    SUGGEST changing argument order
    done;
...The warning means that if there is a call to match_body/2 (from another clause in match_body/2 or another function), the delayed sub binary optimization will not be possible. More warnings will occur for any place where a sub binary is matched out at the end of and passed as the second argument to match_body/2, for example:
match_head(List, <<_:10,Data/binary>>) ->
        %% NOT OPTIMIZED: called function match_body/2 does not
	%%     begin with a suitable binary matching instruction
    match_body(List, Data).Unused Variables
The compiler figures out if a variable is unused. The same code is generated for each of the following functions:
count1(<<_,T/binary>>, Count) -> count1(T, Count+1); count1(<<>>, Count) -> Count. count2(<<H,T/binary>>, Count) -> count2(T, Count+1); count2(<<>>, Count) -> Count. count3(<<_H,T/binary>>, Count) -> count3(T, Count+1); count3(<<>>, Count) -> Count.
In each iteration, the first 8 bits in the binary will be skipped, not matched out.
