Functions in data structures

Wed Jun 18 11:35:47 CEST 2003

On Wed, 18 Jun 2003, Wiger Ulf wrote:

Uffe> The main reason for this rambling is that you need to spend some 
Uffe> time rather early thinking about what your upgrade requirements are,
Uffe> and how they affect your design choices.

I wrote some internal notes about things to think about when
performing changes in the Mnesia application. The notes are
from 1998 and a bit outdated, but I think that they still can
serve as a source for inspiration when you think about 
different types of future changes that you need to prepare
your own application for.

/Håkan

---
Håkan Mattsson
Ericsson
High Availability Software, DBMS Internals
http://www.ericsson.com/cslab/~hakan/
-------------- next part --------------

Mnesia upgrade policy (PA3)
===========================

This paper describes the upgrade policy of the Mnesia application.
It is divided into the following chapters:

	o Architecture overview
	o Compatibility
	o Configuration management
	o Compatibility requirements on other applications
	o Upgrade scenarios
	o Remaining issues

Architecture overview
---------------------

Mnesia is a distributed DataBase Management System (DBMS), appropriate
for telecommunications applications and other Erlang applications
which require continuous operation and exhibit soft real-time
properties. Mnesia is entirely implemented in in Erlang.

Meta data about the persistent tables is stored in a public ets table,
called mnesia_gvar. This ets table is accessed concurrently by several
kinds of processes:

* Static - on each node Mnesia has about 10 static
  processes. Eg. monitor, controller, tm, locker, recover, dumper...

* Dynamic - there are several kinds of dynamic processes are created
  for various purposes. Eg. tab_copier, perform_dump, backup_master ...

* Client processes created by Mnesia users. These invokes the Mnesia
  API to perform operations such as table lookups, table updates,
  table reconfigurations...

All these kinds of processes communicates with each other both locally
(on the same node) and remotely.

Mnesia may either be configured to use local disc or be to totally
disc less. All disc resident data is located under one directory and
is hosted by disk_log and dets.

The persistent tables may be backed up to external media. By default
backups are hosted by disk_log. 

Compatibility
-------------

Each release of Mnesia has a unique version identifier. If Mnesia is
delivered to somebody outside the development team, the version
identifier is always changed, even if only one file differs from the
previous release of Mnesia. This means that the exact version of each
file in Mnesia can be determined from the version identifier of the
Mnesia application.

Mnesia does NOT utilize the concept of OTP "patches". The smallest
deliverable is currently the entire application. In future releases we
will probably deliver binaries, source code and different kinds of
documentation as separate packages.

Changes of the version identifier follows a certain pattern, depending
of how compatible the new release is with the previous dito. The
version identifier consists of 3 integer parts: Major.Minor.Harmless
(or just Major.Minor when Harmless is 0).

If a harmless detail has been changed, the "Harmless" part is
incremented. The change must not imply any incompatibility problems
for the user. An application upgrade script (.appup) could be
delivered in order to utilize code change in a running system, if
required.

If the change is more than a harmless detail (or if it is a harmless
change but has implied a substantial rewrite of the code), the "Minor"
part is incremented. The change must not imply any incompatibility
problems for the user. An application upgrade script (.appup) for code
change in a running system, is not always possible to deliver if the
change is complex. A full restart of Mnesia (and all applications
using Mnesia) may be required, potentially on all nodes
simultaneously.

All other kinds of changes implies the "Major" part to be incremented.
The change may imply that users of Mnesia needs to rewrite their code,
restart from backup or other inconveniences. Application upgrade
script (.appup) for code change in a running system is probably too
complex to write.

In any case it is a good idea to always read the release notes.

Configuration management
------------------------

Each major release constitutes an own development branch. Bugfixes,
new functionality etc. are normally performed in the latest major
development branch only. The last release of Mnesia is always the best
and customers are encouraged to upgrade to it.

Application upgrade scripts (.appup) are only written if they are
explicitly requested by a customer. Preparing the code for a
smoothless code change in a running system is very demanding and
restrains productive development. The code that handles the upgrade is
extremely hard to test, but it must be 100% correct. If it is not 100%
correct it is totally useless since the error may easily escalate to a
node restart or an inconsistent database.

It may however not always be possible to enforce a customer to accept
the latest release of incompatibility reasons. If a customer already
has a running system and encounters a serious bug in Mnesia, we may be
enforced to fix the bug in an old release. Such a bugfix is performed
in a separate bugfix branch (dedicated for this particular bugfix).
All source files in the release is labled with the version identifier
(e.g. "mnesia_3.4.1"). An application upgrade script (.appup) is
probably needed.

Compatibility requirements on other applications
------------------------------------------------

Mnesia is entirely implemented in Erlang and depends heavily on the
Erts, Kernel and StdLib applications. 

Changes of storage format in dets and disk_log may cause these files
to be automatically upgraded to the new format, but only if Mnesia
allows them to perform the "repair". As an effect of such a format
upgrade it is likely that the Mnesia files cannot be read by older
releases, but it may be acceptable that old releases not are forward
compatible.

Mnesia stores its data as binaries. Changes of the external binary
format must also be backward compatible. Automatic conversion to
the new format is required.

Note, that Mnesia backups may be archived in years and that this
requires Erts and Kernel to be backward compatible with very old
binary formats and disk_log formats.

Upgrade scenarios
-----------------

There are a wide range of upgrade scenarios that needs to be analyzed
before they occur in reality. Here follows a few of them:

Backup format change.

  In the abstract header of the backup there is a version tag that
  identifies the format of the rest of the backup. Backups may be
  archived for a long period of time and it is important that
  old backups can be loaded into newer versions of the system.
  The backup format is Mnesia's survival format.

  Changes in the abstract backup format requires the backup version
  tag to be incremented and code to be written to convert the old
  format at backup load time. Minor change.

  The concrete format is an open format handled by a callback module
  and it is up to the callback module implementors to handle future
  changes. The default callback module uses disk_log and Mnesia relies
  heavily on disk_log's ability to automatically convert old disk_log
  files into new dito if the concrete format has been changed.

Transaction log file format change.

  In the abstract header of the transaction log there is a version tag
  that identifies the format of the rest of the log. Changes in the
  abstract transaction log format requires the transaction log version
  tag to be incremented and code to be written to convert the old
  format when the log is dumped at startup. Minor change.

  The concrete format is hidden by disk_log and Mnesia relies on
  disk_log's ability to automatically convert old disk_log files into
  new dito if the concrete format has been changed.

  If the abstract format change is severe or if disk_log cannot
  handle old disk_log formats the entire Mnesia database has to
  be backed up to external media and then installed as fallback.
  It may in worst case be neccessary to implement a new backup
  callback module that does not make use of disk_log. Major change.

Decision table log file format change.

  In the abstract header of the decision table log file there is a
  version tag that identifies the format of the rest of the
  log. The concrete format is hidden by disk_log and severe changes
  in the abstract or concrete formats are handled in the same way
  as complex changes of the transaction log file format: back the
  database up and re-install it as a fallback. Major change.

  Minor changes in the abstract format can be handled by conversion
  code run at startup. Minor change.

.DAT file format change.

  .DAT files has no abstract version format identifier.

  The concrete format is hidden by dets and Mnesia relies on
  dets' ability to automatically convert old dets files into
  new dito if the concrete format has been changed. 

  Changes in the abstract or incompatible changes in the concrete
  format are are handled in the same way as complex changes in the
  transaction log file format: back the database up and re-install
  it as a fallback.

Core file format change.

  The core file is a debugging aid and contains essential information
  about the database. The core is automatically produced when Mnesia
  encounters a fatal error or manually by the user for the purpose
  of enclosing it into a trouble report. The core file consists of
  list of tagged tuples stored as a large binary. Future changes of
  the core format can easily be handled by adding a version tag
  first in the list if neccessary.

Persistent schema format change.

  The schema is implemented as an ordinary table with table names
  as key and a list of table properties as single attribute.
  Each table property is represented as a tagged tuple and adding
  new properties will not break any code. Minor change.

  Incompatible changes of the schema representation can be handled
  in the same way as complex changes of the transaction log file
  format: back the database up and re-install it as a fallback.
  Major change.

  If the change is severe and impacts the backup format then it should
  not be performed at all.

Renaming schema table to mnesia_schema.

 All internal (d)ets tables are prefixed with 'mnesia_' in order
 to avoid name conflicts with other applications. The only
 exception is the table named 'schema'.

 Renaming 'schema' to 'mnesia_schema' is a major change, that
 may breaks much customer code if it is not done very careful,
 since the name of the table is a part of the Mnesia API.
 The least intrusive change would be to leave the name 'schema'
 as a part of the API, but internally use the name 'mnesia_schema'
 and map all usages of 'schema' in all internal modules that accesses
 the schema table. It is however questionable if the change is
 worth the work effort.

Transient schema format change

  The transient schema information is stored in a public ets table
  named mnesia_gvar. Changes in the transient schema format affects
  a lot of processes and it is not feasible to change it dynamically.
  A restart on all db_nodes is required. Minor change.

Configuration parameter change. *** partly not supported ***

  Many of the configuration parameters ought to be possible to
  be changed dynamically in a running system:

     access_module
     backup_module
     debug
     dump_log_load_regulation
     dump_log_time_threshold
     dump_log_update_in_place
     dump_log_write_threshold
     event_module
     extra_db_nodes

  The remaining configuration parameters are only interesting at
  startup and any dynamic change of these should silently be ignored:

     auto_repair
     dir
     embedded_mnemosyne
     ignore_fallback_at_startup
     max_wait_for_decision
     schema_location

Inter node protocol change. *** partly not supported ***

  When Mnesia on one node tries to connect to Mnesia on another
  node it negotiates with the other node about which inter node 
  communication protocol to use. A list of all known protocols
  identifiers are sent to the other node which replies with the
  protocol that it prefers. The other node may also reject the
  connection. Always when the inter node protocol needs to be changed,
  the protocol identifier is changed. 

  If the change is a compatible change and we need to upgrade the
  protocol in a running system things gets a little bit complicated.
  A new version of the software which understands both the old an new
  protocols must be loaded on all nodes as a first step. All processes
  that uses old code must somehow switch to use the new code. One
  severe problem here is to locate all processes that needs a code
  switch. Server processes are fairly straight forward to manage.
  Client processes that are waiting in a receive statement are far
  more difficult to first locate and then force a code switch.
  We may prepare for the code switch by letting the client be
  able to receive a special code switch message and then continue
  to wait for interesting messages. (gen_server does not handle this
  but since Mnesia does not use gen_server's for performance critical
  processes or crash sensitive processes Mnesia can handle this in
  many cases.)

  If the new protocol is a pure extension of the old dito we may
  go ahead and use the new protocol and bump up the protocol
  version identifier.

  More complex protocol changes are sometimes possible to handle,
  but since they are extreamly hard to test it is probably better
  to restart the system.

  If the change is an incompatible change, Mnesia must be restarted on
  all nodes.

Intra node protocol change.

  Changing protocol between processes on the same node is slightly
  easier than changing inter node protocols. The difference is that
  we may iterate over all nodes and restart them one and one until
  the software has started to use the new protocols on all nodes.

Adding a new static process.

  Adding a new static process in Mnesia is a minor change,
  but cannot be handled by the supervisor "application"
  if the shutdown order of the static processes are important.
  The most relieable approach here is to restart Mnesia.

Adopt to new functionality in Erts, Kernel or StdLib:

  When was the new functionality introduced?
  Can Mnesia use the new functionality and still
  run on older Erlang/OTP releases?

Changing a function in the Mnesia API. 

  Changes of module name, function name, arguments or
  semantics of a function is likely to be a major change.

  Adding a brand new function is a minor change.

Changing a Mnesia internal function.

  If it is possible to locate all processes that may
  invoke the function it may be worth to handle the
  code change in a running system. The safe approach
  is to restart the node. The processes must be able
  to handle 'sys' messages and export 'sys' callback
  functions. Minor change.

Process state format change.

  If it is possible to locate all processes that may
  invoke the function it may be worth to handle the
  code change in a running system. The safe approach
  is to restart the node. The processes must be able
  to handle 'sys' messages and export 'sys' callback
  functions. Minor change.

Code change to fix a harmless bug.

  Does these exist in reality or only in the Erlang book?

Code change to fix an inconsistency bug.

  A bug like all others but the effect of it is an
  inconsistent database. Mnesia cannot repair the database
  and it is up to the Mnesia users to make it consistent
  again. There are three options here:

  - ignore the fact that the database is inconsistent
  - back up the database and make the backup consistent
    before installing it as a fallback.
  - fix the inconsistency on-line while all applications
    are accessing the database

Code change to prevent an indefinite wait (deadlock, protocol mismatch).

  A bug like all others but the effect of it is hanging
  processes. Restart Mnesia on one or more nodes.

Remaining issues
----------------

The following issues remains to be implemented:

	- Dynamic configuration parameter change
	- Prepare code switch of client code