mnesia, large datasets and key range lookups
Jouni Rynö
Jouni.Ryno@REDACTED
Tue Jul 13 11:40:49 CEST 2004
On Tue, 2004-07-13 at 11:13 +0200, Nigel.Head@REDACTED wrote:
> > make_key(#time_parameter{time=T, parameter=P}) ->
> > {T, P}.N.
>
> While I'm not exactly sure what the exact application is,
Telemetry data from one of the Rosetta instruments :)
> the original post said
> there might be upto about 30 parameters at any given time. Why is it strictly
> necessary to have the parameter itself as part of the key. I would go for making
> the record contain the time (as a key) and a list of parameter values for that
> time. The list can be variable length, of course.
>
Different telemetry packets can contain the same parameters different
order. Actually, the order is usually the same, but it's only a subset
of the full parameter set.
> This would reduce the number of records by some factor of 10 or so; locating the
> specific parameter you're after would then be some sort of application level
> list search -- didn't ought to be too expensive for a list of max 30 long.
> Chances are you'll be needing other parameters from the same time real soon in
> your processing anyway.
>
For the list searching I would have to identify the order of the
parameters in each time point. Which is kind of having the raw telemetry
data itself. Exactly, what I try get rid of by using the mnesia :)
Hmm, I could use tuples in the list, like this [{NCSA0005, Value},
{NCSA0010, Value2}, ...]. That could be scanned pretty easy with
lists:keysearch. On the other hand, mnesia does that to me, see below.
The interesting part of this email :) is that the time taken is not so
dependant on the number of the parameters in each time stamp. It depends
(only) from the size of the table itself, as mnesia has to (?, I think)
scan through the whole table to find the keys. It's not an ordered set,
but a bag.
So this search:
Selection is [{{time_parameter,'$1','$2','_'},
[{'<',{const,1607.54},'$1'},
{'=<','$1',{const,1607.58}},
{'orelse',{'=:=',{const,"NCSA0005"},'$2'},
{'=:=',{const,"NCSA0010"},'$2'},
{'=:=',{const,"NCSA0014"},'$2'},
{'=:=',{const,"NCSA0016"},'$2'},
{'=:=',{const,"NCSA0018"},'$2'}}],
['$_']}]
is not really time dependant on the number of those parameters ($2)
(compared to the total time of the search).
And yes, like you thought, I do need the other parameters real soon :)
So I think (at the moment), that the best route is to first narrow the
search range by some kind of fragmented tables. I'll let you know ...
regards
Jouni
--
Jouni Rynö mailto://Jouni.Ryno@fmi.fi/
http://www.geo.fmi.fi/~ryno/
Finnish Meteorological Institute http://www.fmi.fi/
Space Research http://www.geo.fmi.fi/
P.O.BOX 503 Tel (+358)-9-19294656
FIN-00101 Helsinki FAX (+358)-9-19294603
Finland priv-GSM (+358)-50-5302903
"It's just zeros and ones, it cannot be hard"
More information about the erlang-questions
mailing list