Efficiency Guide

User's Guide

Version 12.0.2

Chapters

8 Processes

8.1  Creating an Erlang Process

An Erlang process is lightweight compared to threads and processes in operating systems.

A newly spawned Erlang process uses 326 words of memory. The size can be found as follows:

Erlang/OTP 24 [erts-12.0] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Eshell V5.6  (abort with ^G)
1> Fun = fun() -> receive after infinity -> ok end end.
#Fun<...>
2> {_,Bytes} = process_info(spawn(Fun), memory).
{memory,1232}
3> Bytes div erlang:system_info(wordsize).
309

The size includes 233 words for the heap area (which includes the stack). The garbage collector increases the heap as needed.

The main (outer) loop for a process must be tail-recursive. Otherwise, the stack grows until the process terminates.

DO NOT

loop() -> 
  receive
     {sys, Msg} ->
         handle_sys_msg(Msg),
         loop();
     {From, Msg} ->
          Reply = handle_msg(Msg),
          From ! Reply,
          loop()
  end,
  io:format("Message is processed~n", []).

The call to io:format/2 will never be executed, but a return address will still be pushed to the stack each time loop/0 is called recursively. The correct tail-recursive version of the function looks as follows:

DO

   loop() -> 
      receive
         {sys, Msg} ->
            handle_sys_msg(Msg),
            loop();
         {From, Msg} ->
            Reply = handle_msg(Msg),
            From ! Reply,
            loop()
    end.

Initial Heap Size

The default initial heap size of 233 words is quite conservative to support Erlang systems with hundreds of thousands or even millions of processes. The garbage collector grows and shrinks the heap as needed.

In a system that use comparatively few processes, performance might be improved by increasing the minimum heap size using either the +h option for erl or on a process-per-process basis using the min_heap_size option for spawn_opt/4.

The gain is twofold:

  • Although the garbage collector grows the heap, it grows it step-by-step, which is more costly than directly establishing a larger heap when the process is spawned.
  • The garbage collector can also shrink the heap if it is much larger than the amount of data stored on it; setting the minimum heap size prevents that.
Warning

The emulator probably uses more memory, and because garbage collections occur less frequently, huge binaries can be kept much longer.

In systems with many processes, computation tasks that run for a short time can be spawned off into a new process with a higher minimum heap size. When the process is done, it sends the result of the computation to another process and terminates. If the minimum heap size is calculated properly, the process might not have to do any garbage collections at all. This optimization is not to be attempted without proper measurements.

8.2  Process Messages

All data in messages between Erlang processes is copied, except for refc binaries on the same Erlang node.

When a message is sent to a process on another Erlang node, it is first encoded to the Erlang External Format before being sent through a TCP/IP socket. The receiving Erlang node decodes the message and distributes it to the correct process.

Receiving messages

The cost of receiving messages depends on how complicated the receive expression is. Simple expressions that match any message are very cheap:

DO

    receive
        Message -> handle_msg(Message)
    end.

This is not always convenient however: we can receive a message that we do not know how to handle at this point, so it's rather common to only match the messages we expect:

    receive
        {Tag, Message} -> handle_msg(Message)
    end.

While this is convenient it means that the entire message queue must be searched until it finds a matching message. This is very expensive for processes with long message queues, so we have added an optimization for the common case of sending a request and waiting for a response shortly after:

DO

    MRef = monitor(process, Process),
    Process ! {self(), MRef, Request},
    receive
        {MRef, Reply} ->
            erlang:demonitor(MRef, [flush]),
            handle_reply(Reply);
        {'DOWN', MRef, _, _, Reason} ->
            handle_error(Reason)
    end.

Since the compiler knows that the reference created by monitor/2 cannot exist before the call (since it is a globally unique identifier), and that the receive only matches messages that contain said reference, it will tell the emulator to search only the messages that arrived after the call to monitor/2.

The above is a simple example where one is but guaranteed that the optimization will take, but what about more complicated code?

Option recv_opt_info

Use the recv_opt_info option to have the compiler print information about receive optimizations. It can be given either to the compiler or erlc:

erlc +recv_opt_info Mod.erl

or passed through an environment variable:

export ERL_COMPILER_OPTIONS=recv_opt_info

Notice that recv_opt_info is not meant to be a permanent option added to your Makefiles, because all messages that it generates cannot be eliminated. Therefore, passing the option through the environment is in most cases the most practical approach.

The warnings look as follows:

efficiency_guide.erl:194: Warning: INFO: receive matches any message, this is always fast
efficiency_guide.erl:200: Warning: NOT OPTIMIZED: all clauses do not match a suitable reference
efficiency_guide.erl:206: Warning: OPTIMIZED: reference used to mark a message queue position
efficiency_guide.erl:208: Warning: OPTIMIZED: all clauses match reference created by monitor/2 at efficiency_guide.erl:206
efficiency_guide.erl:219: Warning: INFO: passing reference created by make_ref/0 at efficiency_guide.erl:218
efficiency_guide.erl:222: Warning: OPTIMIZED: all clauses match reference in function parameter 1

To make it clearer exactly what code the warnings refer to, the warnings in the following examples are inserted as comments after the clause they refer to, for example:

%% DO
simple_receive() ->
    %% efficiency_guide.erl:194: Warning: INFO: not a selective receive, this is always fast
    receive
        Message -> handle_msg(Message)
    end.

%% DO NOT, unless Tag is known to be a suitable reference: see
%% cross_function_receive/0 further down.
selective_receive(Tag, Message) ->
    %% efficiency_guide.erl:200: Warning: NOT OPTIMIZED: all clauses do not match a suitable reference
    receive
        {Tag, Message} -> handle_msg(Message)
    end.

%% DO
optimized_receive(Process, Request) ->
    %% efficiency_guide.erl:206: Warning: OPTIMIZED: reference used to mark a message queue position
    MRef = monitor(process, Process),
    Process ! {self(), MRef, Request},
    %% efficiency_guide.erl:208: Warning: OPTIMIZED: matches reference created by monitor/2 at efficiency_guide.erl:206
    receive
        {MRef, Reply} ->
            erlang:demonitor(MRef, [flush]),
            handle_reply(Reply);
        {'DOWN', MRef, _, _, Reason} ->
            handle_error(Reason)
    end.

%% DO
cross_function_receive() ->
    %% efficiency_guide.erl:218: Warning: OPTIMIZED: reference used to mark a message queue position
    Ref = make_ref(),
    %% efficiency_guide.erl:219: Warning: INFO: passing reference created by make_ref/0 at efficiency_guide.erl:218
    cross_function_receive(Ref).

cross_function_receive(Ref) ->
    %% efficiency_guide.erl:222: Warning: OPTIMIZED: all clauses match reference in function parameter 1
    receive
        {Ref, Message} -> handle_msg(Message)
    end.

Constant Pool

Constant Erlang terms (also called literals) are kept in constant pools; each loaded module has its own pool. The following function does not build the tuple every time it is called (only to have it discarded the next time the garbage collector was run), but the tuple is located in the module's constant pool:

DO

days_in_month(M) ->
    element(M, {31,28,31,30,31,30,31,31,30,31,30,31}).

But if a constant is sent to another process (or stored in an Ets table), it is copied. The reason is that the runtime system must be able to keep track of all references to constants to unload code containing constants properly. (When the code is unloaded, the constants are copied to the heap of the processes that refer to them.) The copying of constants might be eliminated in a future Erlang/OTP release.

Loss of Sharing

Shared subterms are not preserved in the following cases:

  • When a term is sent to another process
  • When a term is passed as the initial process arguments in the spawn call
  • When a term is stored in an Ets table

That is an optimization. Most applications do not send messages with shared subterms.

The following example shows how a shared subterm can be created:

kilo_byte() ->
    kilo_byte(10, [42]).

kilo_byte(0, Acc) ->
    Acc;
kilo_byte(N, Acc) ->
    kilo_byte(N-1, [Acc|Acc]).

kilo_byte/1 creates a deep list. If list_to_binary/1 is called, the deep list can be converted to a binary of 1024 bytes:

1> byte_size(list_to_binary(efficiency_guide:kilo_byte())).
1024

Using the erts_debug:size/1 BIF, it can be seen that the deep list only requires 22 words of heap space:

2> erts_debug:size(efficiency_guide:kilo_byte()).
22

Using the erts_debug:flat_size/1 BIF, the size of the deep list can be calculated if sharing is ignored. It becomes the size of the list when it has been sent to another process or stored in an Ets table:

3> erts_debug:flat_size(efficiency_guide:kilo_byte()).
4094

It can be verified that sharing will be lost if the data is inserted into an Ets table:

4> T = ets:new(tab, []).
#Ref<0.1662103692.2407923716.214181>
5> ets:insert(T, {key,efficiency_guide:kilo_byte()}).
true
6> erts_debug:size(element(2, hd(ets:lookup(T, key)))).
4094
7> erts_debug:flat_size(element(2, hd(ets:lookup(T, key)))).
4094

When the data has passed through an Ets table, erts_debug:size/1 and erts_debug:flat_size/1 return the same value. Sharing has been lost.

In a future Erlang/OTP release, it might be implemented a way to (optionally) preserve sharing.

8.3  SMP Emulator

The emulator takes advantage of a multi-core or multi-CPU computer by running several Erlang scheduler threads (typically, the same as the number of cores).

To gain performance from a multi-core computer, your application must have more than one runnable Erlang process most of the time. Otherwise, the Erlang emulator can still only run one Erlang process at the time.

Benchmarks that appear to be concurrent are often sequential. The estone benchmark, for example, is entirely sequential. So is the most common implementation of the "ring benchmark"; usually one process is active, while the others wait in a receive statement.