mnesia, large datasets and key range lookups
Jouni Rynö
Jouni.Ryno@REDACTED
Tue Jul 13 07:53:56 CEST 2004
Dear all
Just wondering, what one should do:
My data contains irregularly (in time) taken measurements. At a time
there can be anything from 1 to 30 something measurements.
So the first logical start is to define
-record(time_parameter, {time, parameter, value}).
then to define mnesia table like this
mnesia:create_table(time_parameter,[{type, bag},
{disc_only_copies, [node()]},
{attributes, record_info(fields, time_parameter)}]
Now each time key can contain several parameters. But as there can be
thousands of measurements per day, the DB-files will
become really large. A weeks data took about 800 MBytes ...
No problem with the disc space, but with the searching of the data (for
correlating data with the housekeeping parameters). Searching for random
parameter for a certain time interval
dirty_para_request(Parameter, StartTime, EndTime) ->
Sel = [{#time_parameter{parameter=Parameter,
time='$1', value='_'},
[{'<',{const,StartTime},'$1'},{'=<','$1',{const,EndTime}}],
['$_']}],
mnesia:dirty_select(time_parameter, Sel).
will now take about 4 minutes, as Mnesia has to scan trough the full DB.
Extrapolating this even for a years operation means, that one has to do
something else.
So far the only thing I can think about, is to split the db-tables to
daily tables. Then first select the relevant tables by name (based on
the time) and make the search from those tables only.
But are there any other solutions?
regards
Jouni
--
Jouni Rynö mailto://Jouni.Ryno@fmi.fi/
http://www.geo.fmi.fi/~ryno/
Finnish Meteorological Institute http://www.fmi.fi/
Space Research http://www.geo.fmi.fi/
P.O.BOX 503 Tel (+358)-9-19294656
FIN-00101 Helsinki FAX (+358)-9-19294603
Finland priv-GSM (+358)-50-5302903
"It's just zeros and ones, it cannot be hard"
More information about the erlang-questions
mailing list