3 Building A Mnesia Database
This chapter details the basic steps involved when designing a Mnesia database and the programming constructs which make different solutions available to the programmer. The chapter includes the following sections:
- defining a schema
- the datamodel
- starting Mnesia
- creating new tables.
3.1 Defining a Schema
The configuration of a Mnesia system is described in the schema. The schema is a special table which contains information such as the table names and each table's storage type, (i.e. whether a table should be stored in RAM, on disc or possibly on both, as well as its location).
Unlike data tables, information contained in schema tables can only be accessed and modified by using the schema related functions described in this section.
Mnesia has various functions for defining the database schema. It is possible to move tables, delete tables, or reconfigure the layout of tables.
An important aspect of these functions is that the system can access a table while it is being reconfigured. For example, it is possible to move a table and simultaneously perform write operations to the same table. This feature is essential for applications that require continuous service.
The following section describes the functions available for schema management, all of which return a tuple:
{atomic, ok}
; or,
{aborted, Reason}
if unsuccessful.
3.1.1 Schema Functions
mnesia:create_schema(NodeList)
. This function is used to initialize a new, empty schema. This is a mandatory requirement before Mnesia can be started. Mnesia is a truly distributed DBMS and the schema is a system table that is replicated on all nodes in a Mnesia system. The function will fail if a schema is already present on any of the nodes inNodeList
. This function requires Mnesia to be stopped on the alldb_nodes
contained in the parameterNodeList
. Applications call this function only once, since it is usually a one-time activity to initialize a new database.
mnesia:delete_schema(DiscNodeList)
. This function erases any old schemas on the nodes inDiscNodeList
. It also removes all old tables together with all data. This function requires Mnesia to be stopped on alldb_nodes
.
mnesia:delete_table(Tab)
. This function permanently deletes all replicas of tableTab
.
mnesia:clear_table(Tab)
. This function permanently deletes all entries in tableTab
.
mnesia:move_table_copy(Tab, From, To)
. This function moves the copy of tableTab
from nodeFrom
to nodeTo
. The table storage type,{type}
is preserved, so if a RAM table is moved from one node to another node, it remains a RAM table on the new node. It is still possible for other transactions to perform read and write operation to the table while it is being moved.
mnesia:add_table_copy(Tab, Node, Type)
. This function creates a replica of the tableTab
at nodeNode
. TheType
argument must be either of the atomsram_copies
,disc_copies
, ordisc_only_copies
. If we add a copy of the system tableschema
to a node, this means that we want the Mnesia schema to reside there as well. This action then extends the set of nodes that comprise this particular Mnesia system.
mnesia:del_table_copy(Tab, Node)
. This function deletes the replica of tableTab
at nodeNode
. When the last replica of a table is removed, the table is deleted.
mnesia:transform_table(Tab, Fun, NewAttributeList, NewRecordName)
. This function changes the format on all records in tableTab
. It applies the argumentFun
to all records in the table.Fun
shall be a function which takes an record of the old type, and returns the record of the new type. The table key may not be changed.-record(old, {key, val}). -record(new, {key, val, extra}). Transformer = fun(X) when record(X, old) -> #new{key = X#old.key, val = X#old.val, extra = 42} end, {atomic, ok} = mnesia:transform_table(foo, Transformer, record_info(fields, new), new),TheFun
argument can also be the atomignore
, it indicates that only the meta data about the table will be updated. Usage ofignore
is not recommended (since it creates inconsistencies between the meta data and the actual data) but included as a possibility for the user do to his own (off-line) transform.
change_table_copy_type(Tab, Node, ToType)
. This function changes the storage type of a table. For example, a RAM table is changed to a disc_table at the node specified asNode
.
3.2 The Data Model
The data model employed by Mnesia is an extended relational data model. Data is organized as a set of tables and relations between different data records can be modeled as additional tables describing the actual relationships. Each table contains instances of Erlang records and records are represented as Erlang tuples.
Object identifiers, also known as oid, are made up of a table name and a key. For example, if we have an employee record represented by the tuple
{employee, 104732, klacke, 7, male, 98108, {221, 015}}
. This record has an object id, (Oid) which is the tuple{employee, 104732}
.Thus, each table is made up of records, where the first element is a record name and the second element of the table is a key which identifies the particular record in that table. The combination of the table name and a key, is an arity two tuple
{Tab, Key}
called the Oid. See Chapter 4: Record Names Versus Table Names, for more information regarding the relationship between the record name and the table name.What makes the Mnesia data model an extended relational model is the ability to store arbitrary Erlang terms in the attribute fields. One attribute value could for example be a whole tree of oids leading to other terms in other tables. This type of record is hard to model in traditional relational DBMSs.
3.3 Starting Mnesia
Before we can start Mnesia, we must initialize an empty schema on all the participating nodes.
- The Erlang system must be started.
- Nodes with disc database schema must be defined and implemented with the function
create_schema(NodeList).
When running a distributed system, with two or more participating nodes, then the
mnesia:start( ).
function must be executed on each participating node. Typically this would be part of the boot script in an embedded environment. In a test environment or an interactive environment,mnesia:start()
can also be used either from the Erlang shell, or another program.3.3.1 Initializing a Schema and Starting Mnesia
To use a known example, we illustrate how to run the Company database described in Chapter 2 on two separate nodes, which we call
a@gin
andb@skeppet
. Each of these nodes must have have a Mnesia directory as well as an initialized schema before Mnesia can be started. There are two ways to specify the Mnesia directory to be used:
- Specify the Mnesia directory by providing an application parameter either when starting the Erlang shell or in the application script. Previously the following example was used to create the directory for our Company database:
%erl -mnesia dir '"/ldisc/scratch/Mnesia.Company"'
- If no command line flag is entered, then the Mnesia directory will be the current working directory on the node where the Erlang shell is started.
To start our Company database and get it running on the two specified nodes, we enter the following commands:
- On the node called gin:
gin %erl -sname a -mnesia dir '"/ldisc/scratch/Mnesia.company"'
- On the node called skeppet:
skeppet %erl -sname b -mnesia dir '"/ldisc/scratch/Mnesia.company"'
- On one of the two nodes:
(a@gin1)>mnesia:create_schema([a@gin, b@skeppet]).
- The function
mnesia:start()
is called on both nodes.
- To initialize the database, execute the following code on one of the two nodes.
dist_init() -> mnesia:create_table(employee, [{ram_copies, [a@gin, b@skeppet]}, {attributes, record_info(fields, employee)}]), mnesia:create_table(dept, [{ram_copies, [a@gin, b@skeppet]}, {attributes, record_info(fields, dept)}]), mnesia:create_table(project, [{ram_copies, [a@gin, b@skeppet]}, {attributes, record_info(fields, project)}]), mnesia:create_table(manager, [{type, bag}, {ram_copies, [a@gin, b@skeppet]}, {attributes, record_info(fields, manager)}]), mnesia:create_table(at_dep, [{ram_copies, [a@gin, b@skeppet]}, {attributes, record_info(fields, at_dep)}]), mnesia:create_table(in_proj, [{type, bag}, {ram_copies, [a@gin, b@skeppet]}, {attributes, record_info(fields, in_proj)}]).
As illustrated above, the two directories reside on different nodes, because the
/ldisc/scratch
(the "local" disc) exists on the two different nodes.By executing these commands we have configured two Erlang nodes to run the Company database, and therefore, initialize the database. This is required only once when setting up, the next time the system is started
mnesia:start()
is called on both nodes, to initialize the system from disc.In a system of Mnesia nodes, every node is aware of the current location of all tables. In this example, data is replicated on both nodes and functions which manipulate the data in our tables can be executed on either of the two nodes. Code which manipulate Mnesia data behaves identically regardless of where the data resides.
The function
mnesia:stop()
stops Mnesia on the node where the function is executed. Both thestart/0
and thestop/0
functions work on the "local" Mnesia system, and there are no functions which start or stop a set of nodes.3.3.2 The Start-Up Procedure
Mnesia is started by calling the following function:
mnesia:start().This function initiates the DBMS locally.
The choice of configuration will alter the location and load order of the tables. The alternatives are listed below:
- Tables that are stored locally only, are initialized from the local Mnesia directory.
- Replicated tables that reside locally as well as somewhere else are either initiated from disc or by copying the entire table from the other node depending on which of the different replicas is the most recent. Mnesia determines which of the tables is the most recent.
- Tables that reside on remote nodes are available to other nodes as soon as they are loaded.
Table initialization is asynchronous, the function call
mnesia:start()
returns the atomok
and then starts to initialize the different tables. Depending on the size of the database, this may take some time, and the application programmer must wait for the tables that the application needs before they can be used. This achieved by using the function:
mnesia:wait_for_tables(TabList, Timeout)
This function suspends the caller until all tables specified in
TabList
are properly initiated.A problem can arise if a replicated table on one node is initiated, but Mnesia deduces that another (remote) replica is more recent than the replica existing on the local node, the initialization procedure will not proceed. In this situation, a call to to
mnesia:wait_for_tables/2
suspends the caller until the remote node has initiated the table from its local disc and the node has copied the table over the network to the local node.This procedure can be time consuming however, the shortcut function shown below will load all the tables from disc at a faster rate:
mnesia:force_load_table(Tab)
. This function forces tables to be loaded from disc regardless of the network situation.
Thus, we can assume that if an application wishes to use tables
a
andb
, then the application must perform some action similar to the below code before it can utilize the tables.case mnesia:wait_for_tables([a, b], 20000) of {timeout, RemainingTabs} -> panic(RemainingTabs); ok -> synced end.
When tables are forcefully loaded from the local disc, all operations that were performed on the replicated table while the local node was down, and the remote replica was alive, are lost. This can cause the database to become inconsistent.
If the start-up procedure fails, the
mnesia:start()
function returns the cryptic tuple{error,{shutdown, {mnesia_sup,start,[normal,[]]}}}
. Use command line arguments -boot start_sasl as argument to the erl script in order to get more information about the start failure.3.4 Creating New Tables
Mnesia provides one function to create new tables. This function is:
mnesia:create_table(Name, ArgList).
When executing this function, it returns one of the following responses:
{atomic, ok}
if the function executes successfully
{aborted, Reason}
if the function fails.
The function arguments are:
Name
is the atomic name of the table. It is usually the same name as the name of the records that constitute the table. (Seerecord_name
for more details.)
ArgList
is a list of{Key,Value}
tuples. The following arguments are valid:
{type, Type}
whereType
must be either of the atomsset
,ordered_set
orbag
. The default value isset
. Note: currently 'ordered_set' is not supported for 'disc_only_copies' tables. A table of typeset
orordered_set
has either zero or one record per key. Whereas a table of typebag
can have an arbitrary number of records per key. The key for each record is always the first attribute of the record.
The following example illustrates the difference between typeset
andbag
:
f() -> F = fun() -> mnesia:write({foo, 1, 2}), mnesia:write({foo, 1, 3}), mnesia:read({foo, 1}) end, mnesia:transaction(F).This transaction will return the list[{foo,1,3}]
if thefoo
table is of typeset
. However, list[{foo,1,2}, {foo,1,3}]
will return if the table is of typebag
. Note the use ofbag
andset
table types.
Mnesia tables can never contain duplicates of the same record in the same table. Duplicate records have attributes with the same contents and key.
{disc_copies, NodeList}
, whereNodeList
is a list of the nodes where this table will reside on disc. Write operations to a table replica of typedisc_copies
will write data to the disc copy as well as to the RAM copy of the table.
It is possible to have a replicated table of typedisc_copies
on one node, and the same table stored as a different type on another node. The default value is[]
. This arrangement is desirable if we want the following operational characteristics are required:
A write operation on a
- read operations must be very fast and performed in RAM
- all write operations must be written to persistent storage.
disc_copies
table replica will be performed in two steps. First the write operation is appended to a log file, then the actual operation is performed in RAM.
{ram_copies, NodeList}
, whereNodeList
is a list of the nodes where this table is stored in RAM. The default value forNodeList
is[node()]
. If the default value is used to create a new table, it will be located on the local node only. Table replicas of typeram_copies
can be dumped to disc with the functionmnesia:dump_tables(TabList)
.
{disc_only_copies, NodeList}
. These table replicas are stored on disc only and are therefore slower to access. However, a disc only replica consumes less memory than a table replica of the other two storage types.
{index, AttributeNameList}
, whereAttributeNameList
is a list of atoms specifying the names of the attributes Mnesia shall build and maintain. An index table will exist for every element in the list. The first field of a Mnesia record is the key and thus need no extra index.
The first field of a record is the second element of the tuple, which is the representation of the record.
{snmp, SnmpStruct}
.SnmpStruct
is described in the SNMP User Guide. Basically, if this attribute is present inArgList
ofmnesia:create_table/2
, the table is immediately accessible by means of the Simple Network Management Protocol (SNMP).
It is easy to design applications which use SNMP to manipulate and control the system. Mnesia provides a direct mapping between the logical tables that make up an SNMP control application and the physical data which make up a Mnesia table.[]
is default.
{local_content, true}
When an application needs a table whose contents should be locally unique on each node,local_content
tables may be used. The name of the table is known to all Mnesia nodes, but its contents is unique for each node. Access to this type of table must be done locally.
{attributes, AtomList}
is a list of the attribute names for the records that are supposed to populate the table. The default value is the list[key, val]
. The table must at least have one extra attribute besides the key. When accessing single attributes in a record, it is not recommended to hard code the attribute names as atoms. Use the constructrecord_info(fields,record_name)
instead. The expressionrecord_info(fields,record_name)
is processed by the Erlang macro pre-processor and returns a list of the record's field names. With the record definition-record(foo, {x,y,z}).
the expressionrecord_info(fields,foo)
is expanded to the list[x,y,z]
. Accordingly, it is possible to provide the attribute names yourself, or to use therecord_info/2
notation.
It is recommended that therecord_info/2
notation be used as it is easier to maintain the program and it will be more robust with regards to future record changes.
{record_name, Atom}
specifies the common name of all records stored in the table. All records, stored in the table, must have this name as their first element. Therecord_name
defaults to the name of the table. For more information see Chapter 4: Record Names Versus Table Names.
As an example, assume we have the record definition:
-record(funky, {x, y}).The below call would create a table which is replicated on two nodes, has an additional index on the
y
attribute, and is of typebag
.mnesia:create_table(funky, [{disc_copies, [N1, N2]}, {index, [y]}, {type, bag}, {attributes, record_info(fields, funky)}]).Whereas a call to the below default code values:
mnesia:create_table(stuff, [])would return a table with a RAM copy on the local node, no additional indexes and the attributes defaulted to the list
[key,val]
.