[Ericsson AB]

4 The Abstract Format

This document describes the standard representation of parse trees for Erlang programs as Erlang terms. This representation is known as the abstract format. Functions dealing with such parse trees are compile:forms/[1,2] and functions in the modules epp, erl_eval, erl_lint, erl_pp, erl_parse, and io. They are also used as input and output for parse transforms (see the module compile).

We use the function Rep to denote the mapping from an Erlang source construct C to its abstract format representation R, and write R = Rep(C).

The word LINE below represents an integer, and denotes the number of the line in the source file where the construction occurred. Several instances of LINE in the same construction may denote different lines.

Since operators are not terms in their own right, when operators are mentioned below, the representation of an operator should be taken to be the atom with a printname consisting of the same characters as the operator.

4.1 Module declarations and forms

A module declaration consists of a sequence of forms that are either function declarations or attributes.

4.1.1 Record fields

Each field in a record declaration may have an optional explicit default initializer expression

4.1.2 Representation of parse errors and end of file

In addition to the representations of forms, the list that represents a module declaration (as returned by functions in erl_parse and epp) may contain tuples {error,E} and {warning,W}, denoting syntactically incorrect forms and warnings, and {eof,LINE}, denoting an end of stream encountered before a complete form had been parsed.

4.2 Atomic literals

There are five kinds of atomic literals, which are represented in the same way in patterns, expressions and guards:

Note that negative integer and float literals do not occur as such; they are parsed as an application of the unary negation operator.

4.3 Patterns

If Ps is a sequence of patterns P_1, ..., P_k, then Rep(Ps) = [Rep(P_1), ..., Rep(P_k)]. Such sequences occur as the list of arguments to a function or fun.

Individual patterns are represented as follows:

Note that every pattern has the same source form as some expression, and is represented the same way as the corresponding expression.

4.4 Expressions

A body B is a sequence of expressions E_1, ..., E_k, and Rep(B) = [Rep(E_1), ..., Rep(E_k)].

An expression E is one of the following alternatives:

4.4.1 Generators and filters

When W is a generator or a filter (in the body of a list or binary comprehension), then:

4.4.2 Binary element type specifiers

A type specifier list TSL for a binary element is a sequence of type specifiers TS_1 - ... - TS_k. Rep(TSL) = [Rep(TS_1), ..., Rep(TS_k)].

When TS is a type specifier for a binary element, then:

4.5 Clauses

There are function clauses, if clauses, case clauses and catch clauses.

A clause C is one of the following alternatives:

4.6 Guards

A guard sequence Gs is a sequence of guards G_1; ...; G_k, and Rep(Gs) = [Rep(G_1), ..., Rep(G_k)]. If the guard sequence is empty, Rep(Gs) = [].

A guard G is a nonempty sequence of guard tests Gt_1, ..., Gt_k, and Rep(G) = [Rep(Gt_1), ..., Rep(Gt_k)].

A guard test Gt is one of the following alternatives:

Note that every guard test has the same source form as some expression, and is represented the same way as the corresponding expression.

4.7 The abstract format after preprocessing

The compilation option debug_info can be given to the compiler to have the abstract code stored in the abstract_code chunk in the BEAM file (for debugging purposes).

In OTP R9C and later, the abstract_code chunk will contain

{raw_abstract_v1,AbstractCode}

where AbstractCode is the abstract code as described in this document.

In releases of OTP prior to R9C, the abstract code after some more processing was stored in the BEAM file. The first element of the tuple would be either abstract_v1 (R7B) or abstract_v2 (R8B).


erts 5.7.2
Copyright © 1991-2009 Ericsson AB