dets improvements?

Sat Jun 10 20:58:35 CEST 2006

Den 2006-06-10 17:17:07 skrev Yariv Sadan <yarivvv@REDACTED>:

> Although the need for fragmentation alone isn't a deal breaker for me
> (it's more the long repair time, due to which I don't want to risk
> taking my app offline for hours), it does give the impression that
> Mnesia is behind the curve because no other database I know of puts
> such demands on the user.

To be fair, mnesia doesn't target the same applications as e.g. PostgreSQL  
and MySQL. It's difficult to come up with database systems that achieve  
such tight integration with the applications. There is no semantic gap and  
overhead is very low.

> Even if fragmentation happened behind the scenes, it
> would be a different story. (That's what Postgres does,
> actually, in 1Gb segments -- 
> http://www.postgresql.org/docs/8.1/interactive/storage.html.)

Recall that mnesia is primarily a dbms for embedded realtime systems.  
Normally, one wants to have more control over what goes on, even if it  
means more work up front.

> I think Mnesia should be at least as good as MySQL and
> Postgres at disc storage if not other features as well
> because those are the most popular open source databases
> and hence Mnesia will always be compared to them.

It's certainly been discussed for a long time, but I see two main reasons  
for the current situation:

- The most influential application projects for Erlang's
   development so far simply haven't seen this has a hard
   requirement. And in fairness, only MySQL Cluster (and
   perhaps a few others, like TimesTen) could
   reasonably be a contender to mnesia in those
   applications.

- It *is* more difficult to make efficient disk storage
   for dynamically typed data. Especially ordered_set
   disk storage is extremely difficult to implement
   efficiently without knowing the type or size of the
   keys. IMO, in order to compete squarely with
   conventional DBMSes on large data volumes, mnesia
   will have to allow type definitions of data.

> I'm waiting for the day Mnesia will have extensible indexing, for
> instance (http://www.postgresql.org/docs/8.1/interactive/gist.html),

The rdbms contrib has this, even though it's in early beta.

> allowing support for full-text search
> (http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/),

Again, rdbms has an embryo to this. It's not fully functional yet. I would  
certainly welcome some help, but it's probably best to first verify the  
indexing functionality first.

> and a high-end query optimizer
> (http://www.postgresql.org/docs/8.1/interactive/geqo.html).

Ironically, there was one from the start - mnemosyne. It received a bad  
reputation for a couple of reasons:

(1) The main applications at the time were real-time applications with  
little or no need for an optimizing
query engine. Some people misunderstood this to mean that mnemosyne was no  
good, when in fact it was misused, or simply vast overkill for the given  
applications; and

(2) it was really a research project at the time when it was included in  
OTP, and not enough work was put in to make it product quality.

The aim of mnemosyne was, if I've understood things correctly, partly to  
explore the idea of set comprehensions for database queries, and partly to  
make a very advanced query optimizer, which was especially good at really,  
really hairy queries. There are queries that can bring most query engines  
to their knees, and optimization techniques that can bring down query time  
 from hours to minutes in certain situations. Mnemosyne was especially good  
at resolving recursive dependencies. But all such processing comes at a  
cost, and in practice, mnemosyne was mostly used for very simple queries,  
where optimization made little or no difference.

Basically, most things are possible, but there has to be sufficient demand  
for them, and people willing to put in the effort.

Regards,
Ulf Wiger
-- 
Ulf Wiger