[Erlang Systems]

1 ASN.1

1.1 Introduction

Abstract Syntax Notation One ASN.1 is widely used for the specification of ITU-T and ISO communication protocols. The purpose of ASN.1 is to have a standardized and platform independent language to express types with and to have a standardized set of rules for the transform of values of a defined type, into a stream of bytes. This stream of bytes can then be sent on a communication channel set up by the lower layers in the stack of communication protocols e.g. TCP/IP or encapsulated within UDP packets. This way two different applications written in two completely different programming languages running on different computers with different internal representation of data can exchange instances of structured data types instead of exchanging bytes or even worse bits. This frees the programmer from a great deal of work since no code has to be written to process the transport format of the data.

To write a network application which processes ASN.1 encoded messages it is convenient or even necessary to have a set of off line development tools such as an ASN.1 compiler which can generate the encode and decode logic for the specific ASN.1 datatypes. It is also necessary to combine this with some general runtime support for ASN.1 decoding and decoding. Every ASN.1 compiler must be directed towards a target language or a set of closely related languages. This manual describes a compiler which is directed towards the declarative language Erlang. In order to use this compiler familiarity with the language Erlang is essential. The runtime support for ASN.1 is of course also closely related to the language Erlang and consist of a number of functions that generated code from the compiler makes use of. The types in ASN.1 and how to represent values of those types in Erlang is described in this manual. It is assumed that the reader is familiar with the ASN.1 notation as documented in the standard definition [ITU-T X.680 ].

1.2 The ASN.1 types

In this section we shall go through all the ASN.1 types. How they are used, what their purpose and how to assign Erlang values to the ASN.1 types.

The types in ASN.1 are:

The general and preferred Erlang representation of most ASN.1 types are just the value without a typename. It is also possible to use a tuple notation with type and value like this {Typename,Value}. The representation of SEQUENCE and SET are exceptions to the role above. Below follows a description of how values of each type can be represented in Erlang.

1.2.1 BOOLEAN

Booleans in ASN.1 are used to express values that can be either TRUE or FALSE. Whatever meaning is assigned to TRUE or FALSE is beyond the scope of this text.
In ASN.1 we could have:

Operational ::= BOOLEAN
      

To assign a value to the type Operational in Erlang we could have the following Erlang code:

Myvar1 = true,
Myvar2 = {'Operational',false}
      

1.2.2 INTEGER

ASN.1 itself specifies indefinitely large integers, Depending of the atual implementation of Erlang, this may or may not be supported. The Erlang systems with version 4.3 and higher, supports (very) large integers.

The concept of sub-typing can be applied to integers as well as to other ASN.1 types. We will not explain the details of sub-typing here. This can be read about in [STEED]. A variety of syntaxes are allowed when defining a type to be integer:

T1 ::= INTEGER
T2 ::= INTEGER (-2..7)
T3 ::= INTEGER (0..MAX)
T4 ::= INTEGER (0<..MAX)
T5 ::= INTEGER (MIN<..-99)
T6 ::= INTEGER {red(0),blue(1),white(2)}
      

Now if we were executing some erlang code and wanted to assign values to the types above, we could write.

T1value = 0,
I2 = {'T2',6},
I3 = {'T6',blue},
I4 = {'T6',0},
T6value = white
      

The variables I1 to I4 are now bound to valid instances of ASN.1 defined types. These kind of values can be passed to the encoder for transformation into a series of bytes. Note the use of the type T6, where it is allowed to assign a symbolic value as well as an integer value.

1.2.3 REAL

In this version reals are not implemented. When they are, the following ASN.1 type

R1 ::= REAL
      

Can be assigned a value in Erlang as:

R1value1 = 2.14,
R1value2 = {256,10,-2},
V1 = {'R1',2.56},
V2 = {'R1',{256,10,-2}}

      

Where in the last case the tuple {256,10,-2} is the real number 2.56 in a special notation which will encode faster than simply stating the number as 2.56. The arity three tuple is Mantissa,Base,Exponent i.e. Mantissa * Base^Exponent.

1.2.4 NULL

This type can be useful in some context where we wish to supply a value but not have any significance attached to it. Usually we can declare a component of a structured datatype to be optional instead.

Notype ::= NULL
      

Can be assigned like this in Erlang:

N1 = 'NULL',
N2 = {'Notype','NULL'},
      

Where the actual value is the quoted atom 'NULL'.

1.2.5 ENUMERATED

The enumerated type can be used when the value we wish to describe can take one and only one of a set of predefined values.

DaysOfTheWeek ::= { sunday(1), monday(2),tuesday(3),wednesday(4),
                    thursday(5),friday(6),saturday(7) }
      

To assign a weekday value in Erlang

Day1 = {'DaysOfTheWeek',saturday},
      

The enumerated type is very much alike the integer type when defined with a set of predefined values. An enumerated type may however only have the specified values whereas an integer may have any other value as well.

1.2.6 BIT STRING

The BIT STRING type can be used to model information which is made up of arbitrary length series of bits. It is intended to be used for a selection of flags, not for binary files.

Bits1 ::= BIT STRING
Bits2 ::= BIT STRING {foo(0),bar(1),gnu(2),gnome(3),punk(14)}

There are three different notations available for representation of BIT STRING's in Erlang and as input to the encode functions.

Bits1Val = [0,1,0,1,1],
B1 = {'Bits1',[0,1,0,1,1]},
B2 = {'Bits1',16#1A},

where Bits1Val ,B1 and B2 denotes the same value. Or

Bits2Val1 = [gnu,punk],
Bits2Val2 = 2#1110,
B3 = {'Bits2',[bar,gnu,gnome]},
B4 = {'Bits2',[0,1,1,1]}
      

where Bits2Val2, B3 and B4 all denotes the same value.

Bits2Val1 is assigned symbolic values. The assignment means that the bits corresponding to gnu and punk i.e. bits 2 and 14 are set to 1 and the rest set to 0. The symbolic values appear as a list of values and if a value appear which is not specified in the type definition it will lead to a run-time error.

BIT STRINGS may also be sub-typed with for example a SIZE specification:

Bits3 ::= BIT STRING (SIZE((0..31))

This means that no bit higher than 31 can ever be set.

1.2.7 OCTET STRING

The OCTET STRING is the simplest type of them all in ASN.1. It just says: move the bytes. It can be used to transfer e.g. binary files or other unstructured information that consists of octets and one do not wish the encoder to perform any encoding at all.

We could have the following ASN.1 typedefinitions:

O1 ::= OCTET STRING
O2 ::= OCTET STRING (SIZE(28))

With the following example assignments in Erlang

O1Val = [17,13,19,20],
O2Val = "must be exactly 28 chars....",
Str1 = {'O1',[0,0,0,1,1,1,255,254]},
Str2 = {'O2',"some twentyeight asciioctets"},

Where Str1 is assigned a series of numbers between 0 and 255 i.e. octets. Str2 is assigned using the string notation.

1.2.8 Character Strings

ASN.1 supports a wide variety of character sets. The main difference between OCTET STRINGS and the Character strings is that when using OCTET STRINGS no semantics at all is imposed upon the bytes delivered. However when using for instance the IA5String (which closely resembles ASCII as we know it) the byte 65 (in decimal notation) means the character 'A'.

If we have defined a type to be a VideotexString and we receive e.g. an octet with the unsigned integer value X, then the octet should be interpreted as specified in the standard ITU-T T.100,T.101. The ASN.1 to Erlang compiler does not care what interpretation is the correct for each and every octet value with the different Character strings. We leave it up to the application to do the interpretation of the octes. We just deliver the octets as they come, hence there is not much difference between OCTET STRINGS and the Character strings from the ASN.1 to Erlang compiler point of view when it comes to BER. When PER is used there is definately a big difference between OCTET STRINGS and other strings. Especially the constraints are very important for PER but is not taken into account for BER.

However all the Character strings are supported and we could have the following ASN.1 typedefinitions:

Digs ::= NumericString (SIZE(1..3))
TextFile ::= IA5String (SIZE(0..64000))

and the following Erlang assignments:

DigsVal = "456",
D = {'Digs',"123"},
TextFileVal = "abc...xyz...",
T = {'TextFile',[88,76,55,44,99,121 .......... a lot of characters here  

1.2.9 OBJECT IDENTIFIER

The OBJECT IDENTIFIER is used whenever there is need to uniquely identify something. An ASN.1 module, a transfer syntax, etc. are identified with OBJECT IDENTIFIER. If we have:

Oid ::= OBJECT IDENTIFIER

Then

OidVal = {1,2,55},
Module = {'Oid',{0,0,1,2,3}}

are valid Erlang instances of the type 'Oid'. The OBJECT IDENTIFIER value is simply a tuple with the consecutive values. The symbolic syntax of OBJECT IDENTIFIER's is not yet supported.

The first value is limited to the values 0, 1 or 2 and the second value must be in the range 0..39 when the first value is 0 or 1.

The OBJECT IDENTIFIER is a very important type and it is widely used within different standards to uniquely identify various objects. In [ROSE2] there is an easy-to-understand description of the usage of OBJECT IDENTIFIER.

1.2.10 Object Descriptor

Values of this type can be assigned a value as an ordinary string i.e.
"This is the value of an Object descriptor"

1.2.11 The TIME types

Two different time types are defined within ASN.1 namely GeneralizedTime and UTCTime, both are assigned a value as an ordinary string within doublequotes i.e. "19820102070533.8".

1.2.12 SEQUENCE

The structured types of ASN.1 are constructed from other types in a manner similar to the concepts of array and struct in C. A SEQUENCE in ASN.1 is comparable with a struct in C and a record in Erlang. A SEQUENCE may be defined as:

Pdu ::= SEQUENCE {
   a INTEGER,
   b REAL,
   c OBJECT IDENTIFIER,
   d NULL }

This is a 4-component structure called 'Pdu'. The major format for representation of SEQUENCE in Erlang is the record format. For each SEQUENCE and SET in an ASN.1 module an Erlang record declaration is generated. For Pdu above a record like this is defined:

-record('Pdu',{a, b, c, d}).

The record declarations for a module M are placed in a separate M.hrl file.

Values can be assigned in Erlang like this:

MyPdu = #'Pdu'{a=22,b=77.99,c={0,1,2,3,4},d='NULL'}.

The decode functions will return a record as result when decoding a SEQUENCE or SET.

1.2.13 SET

The SET type is very much like the SEQUENCE type with the difference that the tags of all components must be distinct and the order of the components is non significant. Hence we have to be able to distinguish every component in the 'SET' both when we encode a value of a type defined to be a SET, and when we decode, the tags of all components must be different from each other in order to be distinguishable.

A SET may be defined as:

Pdu2 ::= SET {
    a INTEGER,
    b BOOLEAN,
    c ENUMERATED {on(0),off(1)} }

The major format for representation of SEQUENCE in Erlang is the record format. For each SEQUENCE and SET in an ASN.1 module an Erlang record declaration is generated. For Pdu above a record like this is defined:

-record('Pdu2',{a, b, c}).

The record declarations for a module M are placed in a separate M.hrl file.

Values can be assigned in Erlang like this:

V = #'Pdu2'{a=44,b=false,c=off}.

The decode functions will return a record as result when decoding a SET.

The difference between SET and SEQUENCE is that the order of the components (in the BER encoded format)is undefined for SET and defined as the lexical order from the ASN.1 definition for SEQUENCE. The ASN.1 compiler for Erlang will always encode a SET in the lexical order. The decode routines can handle SET components encoded in any order but will always return the result as a record. Since all components of the SET must be distinguishable both in the encoding phase as well as the decoding phase the following type is not allowed in a module with EXPLICIT or IMPLICIT as tag-default :

Bad ::= SET {i INTEGER,
             j INTEGER }

The ASN.1 to Erlang compiler rejects the above type. We shall not explain the concept of tag further here, we refer to [STEED].

Further we find the concept of SET to be a very strange construct and we can not think of one single application where the set type is really necessary. (Imagine if someone "invented'' the shuffled array in 'C') People tend to think that 'SET' sounds nicer and more mathematical than 'SEQUENCE' and hence use it when a 'SEQUENCE' would have been more appropriate. It is also most inefficent, since every correct implementation of SET must always be prepared to accept any component at all times.

1.2.14 CHOICE

The CHOICE type is a space saver and is similar to the concept of a 'union' in the C-language. As with the previous SET-type the tags of all components of a CHOICE need to be distinct. If AUTOMATIC TAGS is defined for the module (which is preferrable) the tags can be omitted completely in the ASN.1 specification of a CHOICE.

If we have:

T ::= CHOICE {
        x [0] REAL,
        y [1] INTEGER,
        z [2] OBJECT IDENTIFIER }

Values can be assigned as:

TVal = {y,17},
Val1 = {'T',{z,{0,1,2}}},

Note that a CHOICE value always is represented as the the tuple {ChoiceAlternative, Val} where ChoiceAlternative is an atom denoting the selected choice alternative.

It is also allowed to have a CHOICE type tagged as follow:

C ::= [PRIVATE 111] CHOICE {
        C1,
        C2 }

C1 ::= CHOICE { 
     a [0] INTEGER,
     b [1] BOOLEAN }

C2 ::= CHOICE {
     c [2] INTEGER,
     d [3] OCTET STRING }

In this case the top type C appears to have no tags at all in it's components, however both C1 and C2 are also defined as CHOICE types and they have distinct tags among themselves. Hence the above type C is both legal and allowed.

1.2.15 SET OF and SEQUENCE OF

The SET OF and SEQUENCE OF types correspond to the concept of an array found in several programming languages. The Erlang syntax for both of these types is straight forward. For example.

Arr1 = SET SIZE (5) OF INTEGER (4..9)
Arr2 = SEQUENCE OF OCTET STRING

We may have the following in Erlang

Arr1Val = [4,5,6,7,8],
A1 = {'Arr1',[4,5,6,7,8]},
Arr2Val = ["abc",[14,34,54],"Octets"],
A2 = {'Arr2',[[1,2,3,4,5],[255,254,253],"cyborg"]}

Please note that the definition of the SET OF type implies that the order of the components is undefined, but in practice there is no difference between SET OF and SEQUENCE OF. The ASN.1 compiler for Erlang does not randomize the order of the SET OF components before encoding.

1.2.16 Embedded named types

The structured types previosly described may very well have other named types as their components. The general syntax to assign a value to the component C of a named ASN.1 type T in Erlang is the record syntax #'T'{'C'=Value}. Where Value may be a value of yet another type T2 whose value again is assigned as {T2,Value2}.

For example:

B ::= SEQUENCE {
        a Arr1,
        b [0] T }

can be assigned like this in Erlang:

V2 = #'B'{a=[4,5,6,7,8], b={x,7.77}}.
% or like this
V = #'B'{a={'Arr1',[4,5,6,7,8]}, b={'T',{x,7.77}}}.

There is really no reason to write the type name for every component even if it is allowed. The recommendation is to omit the type names for values of SET and SEQUENCE.

1.2.17 Embedded structured types

It is also perfectly allowed in ASN.1 to have components that themselves are structured types. For example we may have:

Emb ::= SEQUENCE {
    a SEQUENCE OF OCTET STRING,
    b SET {
       a [0] INTEGER,
       b [1] INTEGER DEFAULT 66},
    c CHOICE {
       a INTEGER,
       b FooType } }

FooType ::= [3] VisibleString

The following records are generated because of the type Emb:

-record('Emb,{a, b, c}).
-record('Emb_b',{a, b = asn1_DEFAULT}). % the embedded SET type
      

Values of the Emb type can be assigned like this:

V = #'Emb'{a=["qqqq",[1,2,255]], 
           b = #'Emb_b'{a=99}, 
           c ={b,"Can you see this"}}.
      

1.2.18 Recursive types

Types may refer to them selves. Suppose we have:

Rec ::= CHOICE {
     nothing [0] NULL,
     something SEQUENCE {
          a INTEGER,
          b OCTET STRING,
          c Rec }}

This type is recursive i.e. it refers to it self. This is allowed in ASN.1 and the ASN.1 to Erlang compiler supports this. We may assign to this type as:

V = {'Rec',
      {something,#'Rec_something'{a = 77, 
                                  b = "some octets here", 
                                  c = {'Rec',{nothing,'NULL'}}}}.

1.3 ASN.1 Values

Values can be assigned to ASN.1 type within the ASN.1 code itself. As opposed to what we did in the previous chapter where we assigned a value to an ASN.1 type in Erlang. The full value syntax of ASN.1 is supported and [STEED] describes in detail how to assign values in ASN.1 Just one short example:

TT ::= SEQUENCE {
   a INTEGER,
   b SET OF OCTET STRING }

tt TT ::= {a 77,b {"kalle" "kula"}}

The value defined here could be used in several ways. Firstly it could be used as the value in some DEFAULT component.

SS ::= SET {
    s [0] OBJECT IDENTIFIER,
    val TT DEFAULT v }

It could also be used from inside an Erlang program. If the above ASN.1 code was defined in ASN.1 module Mod, then the value of the ASN.1 variable tt can be reached from Erlang as a function call to Mod:tt().

1.4 Getting started

To run the Erlang ASN.1 compiler proceed as in the following example. First create a file called People.asn containing the following:

People DEFINITIONS IMPLICIT TAGS ::=

BEGIN
EXPORTS Person;

Person ::= [PRIVATE 19] SEQUENCE {
        name PrintableString,
        location INTEGER {home(0),field(1),roving(2)},
        age INTEGER OPTIONAL }
END

First this code has to be compiled in order to be used. The parser checks that the syntax is correct and that the text represents proper ASN.1 code and then it generates an abstract syntax tree which is saved in a small database. The code-generator then uses the information in the database in order to generate code. The generated Erlang files will be placed in the current directory or in the directory specified with the {outdir,Dir} option. The compiler can be called from the Erlang shell like this:

1>asn1ct:compile("People").
Erlang ASN.1 compiling "People.py" 
--{generated,"People.asn1db"}--
--{generated,"People.hrl"}--
--{generated,"People.erl"}--
./People.erl:46: Warning: function dec_Person/2 not called
./People.erl:46: Warning: function enc_Person/1 not called
./People.erl:46: Warning: function encoding_rule/0 not called
ok
2>

The module People is now accepted and fed into the database and the generated Erlang code is compiled with the Erlang compiler and loaded into the Erlang runtime system. Now assume we have a network application that receives instances of the ASN.1 defined type Person ,modifies them and sends them back again.

receive
   {Port,{data,Bytes}} ->
       case asn1rt:decode('People','Person',Bytes) of
           {ok,P} ->
               {ok,Answer} = asn1rt:encode('People','Person',mk_answer(P)),
               Port ! {self(),{command,Answer}};
           {error,Reason} ->
               exit({error,Reason})
       end
    end,

In the example above a series of bytes is received from an external source and the bytes is then decoded into a valid Erlang term. This was done with the call asn1rt:decode('People','Person',Bytes) which returned an Erlang value of the ASN.1 type Person. Then we constructed an answer and encoded it with asn1rt:encode('People','Person',Answer) which takes an instance of a defined ASN.1 type and transforms it to a possibly nested list of bytes according to the BER or PER encoding-rules. The encoder and the decoder can also be run from the shell. The following dialogue with the shell illustrates how asn1rt:encode/3 and asn1rt:decode/3 works:

3> Rockstar = {'Person',"Some Name",roving,50}.
{'Person',"Some Name",roving,50}
4> {ok,Bytes} = asn1ct:encode('People','Person',Rockstar). 
{ok,["\363",[17],[19,"\t","Some Name"],[2,[1],2],[2,[1],50]]}
5> FlatBytes= lists:flatten.Bytes)
[243,17,19,9,83,111,109,101,32,78,97,109,101,2,1,2,2,1,50]
6> {ok,Person} = asn1ct:decode('People','Person',FlatBytes).
{ok,{'Person',"Some Name",roving,50}}
7>

Note here that the result from encode is a nested list which must be flattened before the call to decode. The reason for returning a nested list is that it is faster to produce and that the flatten operation is performed automatically when the list is sent via the Erlang port mechanism.

1.5 The ASN.1 application User Interface

The ASN.1 application provides two separate user interfaces:

The purpose with this division into a compile-time and a run-time part is that embedded-systems can load only the run-time part.

1.5.1 Compile-time functions

The ASN.1 compiler can be invoked directly from the command-line by means of the erlc program. This is convienient when compiling many ASN.1 files from the command-line or for use in Makefiles. Here are some examples of how the erlc command can be used to invoke the ASN.1 compiler:

erlc -bper Person.asn
erlc -bber ../Example.asn
erlc -o ../asnfiles -i ../asnfiles -i /usr/local/standards/asn1 Person.asn

The useful options for the ASN.1 compiler is:

-b[per|ber]
Choice of encoding rules, if omitted ber is the default.
-o OutDirectory
Where to put the generated files, default is the current directory.
-i IncludeDir
Where to search for .asn1db files with info about types and values imported from other modules. This option can be repeated many times if there are several places to search in. The compiler will always search the current directory first.

For a complete description of erlc see the erlc reference manual.

The compiler and other compile-time functions can also be invoked from the Erlang shell, below follows a brief description of the major functions, for a complete description of each function see the reference manual for asn1ct.

The compiler is invoked with asn1ct:compile/1 with default options, or asn1ct:compile/2 if explicit options are given. Example:

asn1ct:compile("H323-MESSAGES.asn",[per]).

The generic encode and decode functions can be invoked like this:

asn1ct:encode('H323-MESSAGES','SomeChoiceType',{call,"octetstring"}).
asn1ct:decode('H323-MESSAGES','SomeChoiceType',Bytes).

1.5.2 Run-time functions

A brief description of the major functions is given here, for a complete description of each function see the reference manual for asn1rt.

The generic run-time encode and decode functions can be invoked like this:

asn1rt:encode('H323-MESSAGES','SomeChoiceType',{call,"octetstring"}).
asn1rt:decode('H323-MESSAGES','SomeChoiceType',Bytes).

1.5.3 Errors

During compile time, many errors may occur. Each detected error is written to the screen together with a line number indicating where in the source file the error was detected.

The run-time encoders and decoders (in the asn1rt module) do not execute within a catch. If this is desired the user may execute the call to the actual encoder or decoder within a catch of his/her own.

The second type of error that can occur during runtime are those that (also caused by strange input) the compiled code does guard itself from. In those case the process evaluating the call will crash with the value {error,{asn1,{Description}} where Description is an Erlang term describing the error.

1.6 Macros

Macros is not supported at all in this ASN.1 compiler. The main reason for that is that MACROS ar no longer part of the ASN.1 standard.

1.7 Tags

Every ASN.1 type exept CHOICE and ANY have a tag. This is a unique number that unambiguously identifies the type.

Had ASN.1 been more cleverly designed the normal ASN.1 user should not have had to be concerned about tags. However this is not the case, and hence all users of ASN.1 need to understand all the nitty gritty details of tags.

There are four different types of tags.

universal
for types whose meaning is the same in all applications. Such as integers, sequences and so on i.e. all the built in types.
application
for application specific types. E.g. the types in X.400 Message handling service have this sort of tag.
private
for your very own private types.
context
are used to distinguish otherwise undistinguishable types in a specific context. For example if we have two components of a CHOICE type that are both integer, there is no way for the decoder to figure out wich component was actually chosen, since both components will be tagged as integer. When this or similar situations occur one can choose to tag one or both of the components as context specific to resolve the ambiguity

The tag in the case of the 'Apdu' type [PRIVATE 1] is encoded to an unambiguous sequence of bytes so that when the package of bytes arrives a decoder can look at the first bytes that arrive and decide that the rest of the bytes must be of the type associated with that particular sequence of bytes. This means that each tag must be uniquely associated with one and only one ASN.1 type.

Immediately following the tag comes a sequence of bytes informing the decoder of the length of the instance. This is sometimes referred to as TLV (Tag length value) encoding. Hence the structure of a BER encoded series of bytes is:

Tag Len Value

1.8 Encoding Rules

When the first recommendation on ASN.1 was released 1988 it was accompanied with the Basic Encoding Rules, BER as the only alternative for encoding. BER is a somewhat verbose protocol. It adopts a so-called TLV (type, length, value) approach to encoding in which every element of the encoding carries some type information, some length information and then the value of that element. Where the element is itself structured, then the Value part of the element is itself a series of embedded TLV components, to whatever depth is necessary. In summary BER is not a compact encoding but is fairly fast and easy to produce.

A more compact encoding is achieved with the Packed Encoding Rules PER which was introduced together with the reviced recommmendation 1994. PER takes a rather differnet approach from that taken by BER. The first difference is that the T (Type) part is omitted from the encodings, and any tags in the notation are completely ignored. The potential ambiguites are resolved as follows:

A second difference is that PER takes full account of the subtyping information while BER completely ignores it. PER uses the subtyping information to for example omit length fields whenever possible.

There are two variants of PER , aligned and unaligned. In summary PER result in compact encodings which requires much more computation to produce than for BER.


Copyright © 1991-97 Ericsson Telecom AB