mnesia, large datasets and key range lookups

Jouni Rynö Jouni.Ryno@REDACTED
Tue Jul 13 07:53:56 CEST 2004

Dear all

Just wondering, what one should do:

My data contains irregularly (in time) taken measurements. At a time
there can be anything from 1 to 30 something measurements.

So the first logical start is to define
-record(time_parameter, {time, parameter, value}).

then to define mnesia table like this            
mnesia:create_table(time_parameter,[{type, bag},
                    {disc_only_copies, [node()]},
                    {attributes, record_info(fields, time_parameter)}]

Now each time key can contain several parameters. But as there can be
thousands of measurements per day, the DB-files will
become really large. A weeks data took about 800 MBytes ...

No problem with the disc space, but with the searching of the data (for
correlating data with the housekeeping parameters). Searching for random
parameter for a certain time interval

dirty_para_request(Parameter, StartTime, EndTime) ->
    Sel = [{#time_parameter{parameter=Parameter,
                            time='$1', value='_'},
    mnesia:dirty_select(time_parameter, Sel).

will now take about 4 minutes, as Mnesia has to scan trough the full DB.
Extrapolating this even for a years operation means, that one has to do
something else.

So far the only thing I can think about, is to split the db-tables to
daily tables. Then first select the relevant tables by name (based on
the time) and make the search from those tables only.

But are there any other solutions?  


  Jouni Rynö                            mailto://
  Finnish Meteorological Institute
  Space Research              
  P.O.BOX 503                           Tel      (+358)-9-19294656
  FIN-00101 Helsinki                    FAX      (+358)-9-19294603
  Finland                               priv-GSM (+358)-50-5302903
  "It's just zeros and ones, it cannot be hard"

More information about the erlang-questions mailing list