[erlang-questions] Mnesia create tables best practices

Wed Feb 11 15:01:45 CET 2015

On Wed, Feb 11, 2015 at 9:07 AM, Roberto Ostinelli <
roberto.ostinelli@REDACTED> wrote:

> Adding to this: doesn't it mean that the name and full host of the node
> need to be known before you can initialize your db, hence a release can
> only be specific for a specific full node name?

Indeed, distributed mnesia requires a node list when creating the schema.
So this makes the database specific to the installation. This is not unique
in any way, as RDBMS systems in general has this requirement of "pool
initialization", where initial schemas, clustering and replication
configuration has to happen. In this case, schema creation needs a separate
target, and it is often beneficial to store the mnesia dir outside the
release for easier upgrades.

What I'm coming at is that mnesia was not created for idempotent database
initialization. Other systems may be far better at this, where one can join
new nodes to an existing system dynamically. The task is even automatic in
some systems. But for mnesia, you need to write code. Code gives control
and flexibility at the expense of simple deployment. On the other hand, you
wont be surprised by system behaviour since you wrote the code yourself.
Here is one way to go about it.

Initialize a ``root'' which stores the schema and uses a predefined schema
upon boot. Every other node now joins the cluster, with an empty schema.
You start mnesia on a new node. You call
`mnesia:change_config(extra_db_nodes, ['joiner@REDACTED'])` on root followed
by `mnesia:change_table_copy(schema, 'joiner@REDACTED', disc_copies)` so the
schema is on the disc of the newly joined node.

>From here, you can run `mnesia:add_table_copy(Table, 'joiner@REDACTED',
Type)` to copy tables to the newly joined node for the tables which needs
to be added.

In turn, you have two release configurations. One in which the node boots
with a predefined schema, and one where it boots in a ram_copies schema
mode and doesn't start operating before it has been joined to a cluster.
Once it has been joined, it has a valid mnesia database and is ready for
doing work in the cluster.

If you want the ability to truly elastically add and remove nodes from a
cluster, you probably shouldn't use mnesia. The reason has to do with
mnesia's inability to automatically work with network splits and split
brain. You can add nodes to mnesia ``up to a point'', after which you will
have to move to another database system. As I've argued before[0], in the
sense of the CAP theorem, mnesia is neither CP, nor AP and ``CA'' doesn't
exist as a mode. Remember that mnesia predates the CAP theorem by a couple
of years, and partitioning is highly unlikely in a system where
communication is on the same cross-backplane as your telephony switching
calls. For AP systems, Riak is perhaps the weapon of choice for Erlang
programmers. For a CP system, I don't really have a good elastic candidate,
but others may.

Where mnesia shines are:

* The cluster is of moderate size, perhaps less than 16 nodes.
* The FULL database can be in memory, or if using fragmented tables, loss
of a fragment is not equal to total system catastrophe.
* Interaction between the Erlang processes and the data is tight, so having
access to QLC is important.
* You have most queries as K/V lookups requiring microsecond latency with
few complicated queries joining many tables. In the latter case, you care
about correctness, not latency.
* The most likely failure is loss of a node. It happens rarely and it is
possible to solve through manual intervention in a window of a couple of
days/weeks.

[0] https://medium.com/@jlouis666/mnesia-and-cap-d2673a92850

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150211/453ee5db/attachment.htm>