3 Building A Mnesia Database
This chapter details the basic steps involved when designing a Mnesia database and the programming constructs which make different solutions available to the programmer. The chapter includes the following sections:
- defining a schema
- the datamodel
- starting Mnesia
- creating new tables.
3.1 Defining a Schema
The configuration of a Mnesia system is described in the schema. The schema is a special table which contains information such as the table names and each table's storage type, (i.e. whether a table should be stored in RAM, on disc or possibly on both, as well as its location).
Unlike data tables, information contained in schema tables can only be accessed and modified by using the schema related functions described in this section.
Mnesia has various functions for defining the database schema. It is possible to move tables, delete tables, or reconfigure the layout of tables.
An important aspect of these functions is that the system can access a table while it is being reconfigured. For example, it is possible to move a table and simultaneously perform write operations to the same table. This feature is essential for applications that require continuous service.
The following section describes the functions available for schema management, all of which return a tuple:
- {atomic, ok}; or,
- {aborted, Reason} if unsuccessful.
Schema Functions
- mnesia:create_schema(NodeList). This function is used to initialize a new, empty schema. This is a mandatory requirement before Mnesia can be started. Mnesia is a truly distributed DBMS and the schema is a system table that is replicated on all nodes in a Mnesia system. The function will fail if a schema is already present on any of the nodes in NodeList. This function requires Mnesia to be stopped on the all db_nodes contained in the parameter NodeList. Applications call this function only once, since it is usually a one-time activity to initialize a new database.
- mnesia:delete_schema(DiscNodeList). This function erases any old schemas on the nodes in DiscNodeList. It also removes all old tables together with all data. This function requires Mnesia to be stopped on all db_nodes.
- mnesia:delete_table(Tab). This function permanently deletes all replicas of table Tab.
- mnesia:clear_table(Tab). This function permanently deletes all entries in table Tab.
- mnesia:move_table_copy(Tab, From, To). This function moves the copy of table Tab from node From to node To. The table storage type, {type} is preserved, so if a RAM table is moved from one node to another node, it remains a RAM table on the new node. It is still possible for other transactions to perform read and write operation to the table while it is being moved.
- mnesia:add_table_copy(Tab, Node, Type). This function creates a replica of the table Tab at node Node. The Type argument must be either of the atoms ram_copies, disc_copies, or disc_only_copies. If we add a copy of the system table schema to a node, this means that we want the Mnesia schema to reside there as well. This action then extends the set of nodes that comprise this particular Mnesia system.
- mnesia:del_table_copy(Tab, Node). This function deletes the replica of table Tab at node Node. When the last replica of a table is removed, the table is deleted.
-
mnesia:transform_table(Tab, Fun, NewAttributeList, NewRecordName). This function changes the format on all records in table Tab. It applies the argument Fun to all records in the table. Fun shall be a function which takes an record of the old type, and returns the record of the new type. The table key may not be changed.
-record(old, {key, val}). -record(new, {key, val, extra}). Transformer = fun(X) when record(X, old) -> #new{key = X#old.key, val = X#old.val, extra = 42} end, {atomic, ok} = mnesia:transform_table(foo, Transformer, record_info(fields, new), new),
The Fun argument can also be the atom ignore, it indicates that only the meta data about the table will be updated. Usage of ignore is not recommended (since it creates inconsistencies between the meta data and the actual data) but included as a possibility for the user do to his own (off-line) transform.
- change_table_copy_type(Tab, Node, ToType). This function changes the storage type of a table. For example, a RAM table is changed to a disc_table at the node specified as Node.
3.2 The Data Model
The data model employed by Mnesia is an extended relational data model. Data is organized as a set of tables and relations between different data records can be modeled as additional tables describing the actual relationships. Each table contains instances of Erlang records and records are represented as Erlang tuples.
Object identifiers, also known as oid, are made up of a table name and a key. For example, if we have an employee record represented by the tuple {employee, 104732, klacke, 7, male, 98108, {221, 015}}. This record has an object id, (Oid) which is the tuple {employee, 104732}.
Thus, each table is made up of records, where the first element is a record name and the second element of the table is a key which identifies the particular record in that table. The combination of the table name and a key, is an arity two tuple {Tab, Key} called the Oid. See Chapter 4:Record Names Versus Table Names, for more information regarding the relationship between the record name and the table name.
What makes the Mnesia data model an extended relational model is the ability to store arbitrary Erlang terms in the attribute fields. One attribute value could for example be a whole tree of oids leading to other terms in other tables. This type of record is hard to model in traditional relational DBMSs.
3.3 Starting Mnesia
Before we can start Mnesia, we must initialize an empty schema on all the participating nodes.
- The Erlang system must be started.
- Nodes with disc database schema must be defined and implemented with the function create_schema(NodeList).
When running a distributed system, with two or more participating nodes, then the mnesia:start( ). function must be executed on each participating node. Typically this would be part of the boot script in an embedded environment. In a test environment or an interactive environment, mnesia:start() can also be used either from the Erlang shell, or another program.
Initializing a Schema and Starting Mnesia
To use a known example, we illustrate how to run the Company database described in Chapter 2 on two separate nodes, which we call a@gin and b@skeppet. Each of these nodes must have have a Mnesia directory as well as an initialized schema before Mnesia can be started. There are two ways to specify the Mnesia directory to be used:
-
Specify the Mnesia directory by providing an application parameter either when starting the Erlang shell or in the application script. Previously the following example was used to create the directory for our Company database:
%erl -mnesia dir '"/ldisc/scratch/Mnesia.Company"'
- If no command line flag is entered, then the Mnesia directory will be the current working directory on the node where the Erlang shell is started.
To start our Company database and get it running on the two specified nodes, we enter the following commands:
-
On the node called gin:
gin %erl -sname a -mnesia dir '"/ldisc/scratch/Mnesia.company"'
-
On the node called skeppet:
skeppet %erl -sname b -mnesia dir '"/ldisc/scratch/Mnesia.company"'
-
On one of the two nodes:
(a@gin1)>mnesia:create_schema([a@gin, b@skeppet]).
- The function mnesia:start() is called on both nodes.
-
To initialize the database, execute the following code on one of the two nodes.
As illustrated above, the two directories reside on different nodes, because the /ldisc/scratch (the "local" disc) exists on the two different nodes.
By executing these commands we have configured two Erlang nodes to run the Company database, and therefore, initialize the database. This is required only once when setting up, the next time the system is started mnesia:start() is called on both nodes, to initialize the system from disc.
In a system of Mnesia nodes, every node is aware of the current location of all tables. In this example, data is replicated on both nodes and functions which manipulate the data in our tables can be executed on either of the two nodes. Code which manipulate Mnesia data behaves identically regardless of where the data resides.
The function mnesia:stop() stops Mnesia on the node where the function is executed. Both the start/0 and the stop/0 functions work on the "local" Mnesia system, and there are no functions which start or stop a set of nodes.
The Start-Up Procedure
Mnesia is started by calling the following function:
mnesia:start().
This function initiates the DBMS locally.
The choice of configuration will alter the location and load
order of the tables. The alternatives are listed below:
- Tables that are stored locally only, are initialized from the local Mnesia directory.
- Replicated tables that reside locally as well as somewhere else are either initiated from disc or by copying the entire table from the other node depending on which of the different replicas is the most recent. Mnesia determines which of the tables is the most recent.
- Tables that reside on remote nodes are available to other nodes as soon as they are loaded.
Table initialization is asynchronous, the function call mnesia:start() returns the atom ok and then starts to initialize the different tables. Depending on the size of the database, this may take some time, and the application programmer must wait for the tables that the application needs before they can be used. This achieved by using the function:
- mnesia:wait_for_tables(TabList, Timeout)
This function suspends the caller until all tables specified in TabList are properly initiated.
A problem can arise if a replicated table on one node is initiated, but Mnesia deduces that another (remote) replica is more recent than the replica existing on the local node, the initialization procedure will not proceed. In this situation, a call to to mnesia:wait_for_tables/2 suspends the caller until the remote node has initiated the table from its local disc and the node has copied the table over the network to the local node.
This procedure can be time consuming however, the shortcut function shown below will load all the tables from disc at a faster rate:
- mnesia:force_load_table(Tab). This function forces tables to be loaded from disc regardless of the network situation.
Thus, we can assume that if an application wishes to use tables a and b, then the application must perform some action similar to the below code before it can utilize the tables.
case mnesia:wait_for_tables([a, b], 20000) of {timeout, RemainingTabs} -> panic(RemainingTabs); ok -> synced end.
When tables are forcefully loaded from the local disc, all operations that were performed on the replicated table while the local node was down, and the remote replica was alive, are lost. This can cause the database to become inconsistent.
If the start-up procedure fails, the mnesia:start() function returns the cryptic tuple {error,{shutdown, {mnesia_sup,start,[normal,[]]}}}. Use command line arguments -boot start_sasl as argument to the erl script in order to get more information about the start failure.
3.4 Creating New Tables
Mnesia provides one function to create new tables. This function is: mnesia:create_table(Name, ArgList).
When executing this function, it returns one of the following responses:
- {atomic, ok} if the function executes successfully
- {aborted, Reason} if the function fails.
The function arguments are:
- Name is the atomic name of the table. It is usually the same name as the name of the records that constitute the table. (See record_name for more details.)
-
ArgList is a list of {Key,Value} tuples. The following arguments are valid:
-
{type, Type} where Type must be either of the atoms set, ordered_set or bag. The default value is set. Note: currently 'ordered_set' is not supported for 'disc_only_copies' tables. A table of type set or ordered_set has either zero or one record per key. Whereas a table of type bag can have an arbitrary number of records per key. The key for each record is always the first attribute of the record.
The following example illustrates the difference between type set and bag:
f() -> F = fun() -> \011 mnesia:write({foo, 1, 2}), mnesia:write({foo, 1, 3}), \011 mnesia:read({foo, 1}) end, mnesia:transaction(F).
This transaction will return the list [{foo,1,3}] if the foo table is of type set. However, list [{foo,1,2}, {foo,1,3}] will return if the table is of type bag. Note the use of bag and set table types.
Mnesia tables can never contain duplicates of the same record in the same table. Duplicate records have attributes with the same contents and key.
-
{disc_copies, NodeList}, where NodeList is a list of the nodes where this table will reside on disc.
Write operations to a table replica of type disc_copies will write data to the disc copy as well as to the RAM copy of the table.
It is possible to have a replicated table of type disc_copies on one node, and the same table stored as a different type on another node. The default value is []. This arrangement is desirable if we want the following operational characteristics are required:
- read operations must be very fast and performed in RAM
- all write operations must be written to persistent storage.
A write operation on a disc_copies table replica will be performed in two steps. First the write operation is appended to a log file, then the actual operation is performed in RAM.
-
{ram_copies, NodeList}, where NodeList is a list of the nodes where this table is stored in RAM. The default value for NodeList is [node()]. If the default value is used to create a new table, it will be located on the local node only.
Table replicas of type ram_copies can be dumped to disc with the function mnesia:dump_tables(TabList).
- {disc_only_copies, NodeList}. These table replicas are stored on disc only and are therefore slower to access. However, a disc only replica consumes less memory than a table replica of the other two storage types.
-
{index, AttributeNameList}, where
AttributeNameList is a list of atoms specifying the
names of the attributes Mnesia shall build and maintain. An
index table will exist for every element in the list. The
first field of a Mnesia record is the key and thus need no
extra index.
The first field of a record is the second element of the tuple, which is the representation of the record. -
{snmp, SnmpStruct}. SnmpStruct is
described in the SNMP User Guide. Basically, if this attribute
is present in ArgList of mnesia:create_table/2,
the table is immediately accessible by means of the Simple
Network Management Protocol (SNMP).
It is easy to design applications which use SNMP to manipulate and control the system. Mnesia provides a direct mapping between the logical tables that make up an SNMP control application and the physical data which make up a Mnesia table. [] is default. - {local_content, true} When an application needs a table whose contents should be locally unique on each node, local_content tables may be used. The name of the table is known to all Mnesia nodes, but its contents is unique for each node. Access to this type of table must be done locally.
-
{attributes, AtomList} is a list of the attribute names for the records that are supposed to populate the table. The default value is the list [key, val]. The table must at least have one extra attribute besides the key. When accessing single attributes in a record, it is not recommended to hard code the attribute names as atoms. Use the construct record_info(fields,record_name) instead. The expression record_info(fields,record_name) is processed by the Erlang macro pre-processor and returns a list of the record's field names. With the record definition -record(foo, {x,y,z}). the expression record_info(fields,foo) is expanded to the list [x,y,z]. Accordingly, it is possible to provide the attribute names yourself, or to use the record_info/2 notation.
It is recommended that the record_info/2 notation be used as it is easier to maintain the program and it will be more robust with regards to future record changes.
-
{record_name, Atom} specifies the common name of all records stored in the table. All records, stored in the table, must have this name as their first element. The record_name defaults to the name of the table. For more information see Chapter 4:Record Names Versus Table Names.
-
As an example, assume we have the record definition:
-record(funky, {x, y}).
The below call would create a table which is replicated on two nodes, has an additional index on the y attribute, and is of type bag.
mnesia:create_table(funky, [{disc_copies, [N1, N2]}, {index, [y]}, {type, bag}, {attributes, record_info(fields, funky)}]).
Whereas a call to the below default code values:
mnesia:create_table(stuff, [])
would return a table with a RAM copy on the local node, no additional indexes and the attributes defaulted to the list [key,val].