The external term format is mainly used in the distribution mechanism of Erlang.
Since Erlang has a fixed number of types, there is no need for a programmer to define a specification for the external format used within some application. All Erlang terms has an external representation and the interpretation of the different terms are application specific.
In Erlang the BIF term_to_binary/1,2 is used to convert a term into the external format. To convert binary data encoding a term the BIF binary_to_term/1 is used.
The distribution does this implicitly when sending messages across node boundaries.
The overall format of the term format is:
1 | 1 | N |
131 | Tag | Data |
A compressed term looks like this:
1 | 1 | 4 | N |
131 | 80 | UncompressedSize | Zlib-compressedData |
Uncompressed Size (unsigned 32 bit integer in big-endian byte order) is the size of the data before it was compressed. The compressed data has the following format when it has been expanded:
1 | Uncompressed Size |
Tag | Data |
1 | 1 |
97 | Int |
Unsigned 8 bit integer.
1 | 4 |
98 | Int |
Signed 32 bit integer in big-endian format (i.e. MSB first)
1 | 31 |
99 | Float String |
A float is stored in string format. the format used in sprintf to format the float is "%.20e" (there are more bytes allocated than necessary). To unpack the float use sscanf with format "%lf".
This term is used in minor version 0 of the external format; it has been superseded by NEW_FLOAT_EXT .
1 | 2 | Len |
100 | Len | AtomName |
An atom is stored with a 2 byte unsigned length in big-endian order, followed by Len numbers of 8 bit characters that forms the AtomName. Note: The maximum allowed value for Len is 255.
1 | N | 4 | 1 |
101 | Node | ID | Creation |
Encode a reference object (an object generated with make_ref/0). The Node term is an encoded atom, i.e. ATOM_EXT, NEW_CACHE or CACHED_ATOM. The ID field contains a big-endian unsigned integer, but should be regarded as uninterpreted data since this field is node specific. Creation is a byte containing a node serial number that makes it possible to separate old (crashed) nodes from a new one.
In ID, only 18 bits are significant; the rest should be 0. In Creation, only 2 bits are significant; the rest should be 0. See NEW_REFERENCE_EXT.
1 | N | 4 | 1 |
102 | Node | ID | Creation |
Encode a port object (obtained form open_port/2). The ID is a node specific identifier for a local port. Port operations are not allowed across node boundaries. The Creation works just like in REFERENCE_EXT.
1 | N | 4 | 4 | 1 |
103 | Node | ID | Serial | Creation |
Encode a process identifier object (obtained from spawn/3 or friends). The ID and Creation fields works just like in REFERENCE_EXT, while the Serial field is used to improve safety. In ID, only 15 bits are significant; the rest should be 0.
1 | 1 | N |
104 | Arity | Elements |
SMALL_TUPLE_EXT encodes a tuple. The Arity field is an unsigned byte that determines how many element that follows in the Elements section.
1 | 4 | N |
105 | Arity | Elements |
Same as SMALL_TUPLE_EXT with the exception that Arity is an unsigned 4 byte integer in big endian format.
1 |
106 |
The representation for an empty list, i.e. the Erlang syntax [].
1 | 2 | Len |
107 | Length | Characters |
String does NOT have a corresponding Erlang representation, but is an optimization for sending lists of bytes (integer in the range 0-255) more efficiently over the distribution. Since the Length field is an unsigned 2 byte integer (big endian), implementations must make sure that lists longer than 65535 elements are encoded as LIST_EXT.
1 | 4 | ||
108 | Length | Elements | Tail |
Length is the number of elements that follows in the Elements section. Tail is the final tail of the list; it is NIL_EXT for a proper list, but may be anything type if the list is improper (for instance [a|b]).
1 | 4 | Len |
109 | Len | Data |
Binaries are generated with bit syntax expression or with list_to_binary/1, term_to_binary/1, or as input from binary ports. The Len length field is an unsigned 4 byte integer (big endian).
1 | 1 | 1 | n |
110 | n | Sign | d(0) ... d(n-1) |
Bignums are stored in unary form with a Sign byte
that is 0 if the binum is positive and 1 if is negative. The
digits are stored with the LSB byte stored first. To
calculate the integer the following formula can be used:
B = 256
(d0*B^0 + d1*B^1 + d2*B^2 + ... d(N-1)*B^(n-1))
1 | 4 | 1 | n |
111 | n | Sign | d(0) ... d(n-1) |
Same as SMALL_BIG_EXT with the difference that the length field is an unsigned 4 byte integer.
1 | 1 | 2 | Len |
78 | index | Len | Atom name |
NEW_CACHE works just like ATOM_EXT, but it must also cache the atom in the atom cache in the location given by index. The atom cache is currently only used between real Erlang nodes (not between Erlang nodes and C or Java nodes).
1 | 1 |
67 | index |
When the atom cache is in use, index is the slot number in which the atom MUST be located.
1 | 2 | N | 1 | N' |
114 | Len | Node | Creation | ID ... |
Node and Creation are as in REFERENCE_EXT.
ID contains a sequence of big-endian unsigned integers (4 bytes each, so N' is a multiple of 4), but should be regarded as uninterpreted data.
N' = 4 * Len.
In the first word (four bytes) of ID, only 18 bits are significant, the rest should be 0. In Creation, only 2 bits are significant, the rest should be 0.
NEW_REFERENCE_EXT was introduced with distribution version 4. In version 4, N' should be at most 12.
See REFERENCE_EXT).
1 | 4 | N1 | N2 | N3 | N4 | N5 |
117 | NumFree | Pid | Module | Index | Uniq | Free vars ... |
1 | 4 | 1 | 16 | 4 | 4 | N1 | N2 | N3 | N4 | N5 |
112 | Size | Arity | Uniq | Index | NumFree | Module | OldIndex | OldUniq | Pid | Free Vars |
This is the new encoding of internal funs: fun F/A and fun(Arg1,..) -> ... end.
1 | N1 | N2 | N3 |
113 | Module | Function | Arity |
This term is the encoding for external funs: fun M:F/A.
Module and Function are atoms (encoded using ATOM_EXT, NEW_CACHE or CACHED_ATOM).
Arity is an integer encoded using SMALL_INTEGER_EXT.
1 | 4 | 1 | Len |
77 | Len | Bits | Data |
This term represents a bitstring whose length in bits is not a multiple of 8 (created using the bit syntax in R12B and later). The Len field is an unsigned 4 byte integer (big endian). The Bits field is the number of bits that are used in the last byte in the data field, counting from the most significant bit towards the least significant.
1 | 8 |
70 | IEEE float |
A float is stored as 8 bytes in big-endian IEEE format.
This term is used in minor version 1 of the external format.