requirements of health care apps

Fri Apr 28 12:29:00 CEST 2000

On Thu, 27 Apr 2000, Bud P. Bruegger wrote:

bud>1. Limit of Mnesia databases to a size of 4GB
bud>(http://www.erlang.org/faq/x1084.html#AEN1103).  Competing dbms usually
bud>handle terrabyte of data.  Is there an easy fix?  Does this limitation
bud>apply to a single node in a distributed setting or to the whole database?

The FAQ is partly wrong.

There are no HARD limitations on the maximum database size. 4GB is
the upper limit of one single dets file.

A Mnesia database may consist of lots of tables where each one may use
a dets table as backing storage. In fact you may split your table into
lots of fragments and use dets as backing storage for each fragment.

I would really like to encourage you to perform benchmarks on your own,
using hardware, data volumes and access patterns similar to your
target application. 

There are however two mechanisms in Mnesia, that may probably will
turn out to be showstoppers for your gigantic database:

- repair of dets files. If your systems happens to crash and leave
  a dets file in a opened state, it will automatically recover at
  next startup, but if the file is large it will take ages to repair
  even for rather small dets files. Klacke has suggested a clever
  solution for this (safe dets), this has however not been incorporated
  in Erlang/OTP.

- remote table load. When Mnesia recovers after a node crash, it will
  copy tables from other nodes hosting a more up-to-date replica of
  the table. If the table is large it may take quite a while to transfer
  it between the Erlang nodes. This issue is tricky to solve, without
  major architectural changes of Mnesia.

Storing terrabytes of data in the current Mnesia is not feasible.

But a budget approach, that perhaps would make it possible to use
Mnesia for terrabyte databases, is to extend Mnesia to store blobs as
separate files and only use dets to keep track of the files. This
would dramatically reduce the file size and also make it possible to
use regular tools, such as glimpse, to perform the massive text
search. A new algorithm for replication of blob files could be kept
separate from the current table load algorithm. It would in fact, be
possible to build such a blob replication mechanism as an application
on top of Mnesia, and just use Mnesia to keep track of the meta data,
such as the current replication state.

bud>2. inefficient handling of text 
bud>(http://www.erlang.org/faq/x299.html#AEN309).  Many health care
bud>applications make heavy use of free-text style data (such as medical
bud>records) that are if possible losely structures (with XML).  How difficult
bud>would it be to extend Mnesia/Erlang to efficiently store and manage
bud>(regular expressions, free text search etc.) textual data?  The FAQ
bud>mentions work underway.  Will this only apply to Erlang or also Mnesia?
bud>Does anyone know when to expect the results to be available?

Not this year.

bud>I imagine that Mnesia, being object-relational and very fast, would be a
bud>great backend DBMS for a persistent object store.  The overhead introduced
bud>by a Java persistance layer on top of Mnesia intuitively seems to be less
bud>than that of the "normal" object-relational mapping.  And the distribution
bud>and fault-tolerance features of Mnesia are unique and highly attractive.  
bud>
bud>I would be very interested in you opinion on feasibility, performance,
bud>degree of paradigm mismatch, etc.  A practical project could for example
bud>add a Mnesia backend to Castor (http://castor.exolab.org/).  

Mnesia is a DBMS for Erlang applications. Using Mnesia from other
languages implies that you introduce quite large extra overhead for
process communication (as normal Erlang applications executes in
separate lightweight threads in the same address space as Mnesia) and
data conversion between Java's data structures and Erlang terms.

There are some showstoppers for a Castor backend project, but if you
could live without SQL, XML and text searching this could be an
interesting approach to exploit:

- store the Java data structures in Mnesia as blobs, without converting 
  them to Erlang's datastructures.

- open up Mnesia's inter-node interface and let the Java side act as
  a limited Erlang node running a minimal Mnesia complient server
  without own table replicas.

The performance would of course not be as good as for Erlang applications,
but it could still be acceptable depending on the application requirements.

bud>2. Distributed, fault-tolerant XML Repository
bud>----------------------------------------------
bud>
bud>Another killer app (that is admittedly related to the above) would be a
bud>distributed, fault-tolerant xml repository based on Mnesia.  XML is
bud>difficult to map to relational tables (see
bud>http://www.poet.com/products/cms/white_papers/xml/repository.html) and some
bud>approaches use a (IMHO slow) object-relational mapping
bud>(http://www.informatik.tu-darmstadt.de/DVS1/staff/bourret/xmldbms/xmldbms.ht
bud>m) to overcome this problem.  
bud>I would be very interested in your opinion on how well Mnesia would be
bud>suited as a backend for an XML repository.  For example, how easily the
bud>structure of XML would map to Mnesia.  

In order to achieve acceptable performance, new features needs to
first be added to the Erlang run-time system, then used by Mnesia and
eventually by Mnemosyne:

- searching in lists in ets+dets tables
- regexp searching in binaries in ets+dets tables
- allowing operators like (>, /= etc.) in dets patterns
- asynchronous or threaded I/O
- ...

When this support is included in Erlang/OTP I believe that it would be 
quite easy to build an XML repository on top of Mnesia. It would really
be a killer app.

/Håkan