Mnesia not suitable for time series storage?

Jouni Rynö Jouni.Ryno@REDACTED
Wed Jul 13 14:52:37 CEST 2005


Dear Joel

I had an similar issue in my housekeeping database, also a bag,
disc_only db. The details are now escaping my mind, but the increased
time comes, when the mnesia has to search the whole DB before insertion.
Don't know, if mnesia can optimise the insert, when the DB is in memory
(and is set), but in disc only case it cannot, as items are appended to
the file as they come, not necessarily in sorted order.

The trick was to use fragmented tables, with a special key calculation.
In my case, the fragment was simply the day.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% the key is in 1/65536 second units.
%

key_to_frag_number(State, Key) when record(State, hash_state) ->
    %% io:fwrite("From ~p to ~w~n", [Key, trunc(Key)]),
    IntKey = (Key - ?UNIX_20000101) div 5662310400. % 24*60*60*65536

This made insertion and searches to work in reasonable time scales
(20-60 seconds), instead of minutes even on small data set.

regards
	Jouni

On Wed, 2005-07-13 at 13:10 +0200, Joel Reymont wrote:

> An issue that presents a problem is that exchanges might send many  
> quotes per second but they do not give time in milliseconds, so I  
> need to keep track of the order in which the quotes came in.
> 
> My unoptimized tick record looks like this:
> 
> -record(tick, {
>        symbol,
>        datetime,
>        timestamp,
>        price,
>        size
>       }).
> 
> Symbol is a string (list) of 4 characters, datetime is a bignum  
> formed from a date and time 20040901,09:39:38 by stripping the comma  
> and the colons. Timestamp is now(), price and size are float and  
> integer respectively.
> 
> The timestamp field is what keeps track of the order in which price  
> quotes come in. I do not know yet how to tell Mnesia to return the  
> records sorted by timestamp, at least not how to do it efficiently.
> 
> Table is created like this:
> 
> mnesia:create_table(tick,
>                   [
>                    {disc_copies, Nodes},
>                    {index, [datetime]},
>                    {type, bag},
>                    {attributes, record_info(fields, tick)}])
> 
> The table is a bag because ticks are unique only on the combination  
> of symbol, datetime AND timestamp. I tried making the table a set but  
> only got one record in it.
> 
> Now, here's the kicker... I parse through a file of historical quotes  
> and insert ticks into Mnesia one by one and time the insert. Inserts  
> start at around 1ms and quickly grow to 5, 7, 8, etc. I'm at 24ms per  
> insert right now and I killed my previous attempts to load about  
> 120,000 after an hour or two.
> 
> I would like to be able to store at least 500 ticks per second in  
> Mnesia and have good lookup speed as well. Is this possible?
> 
>      Thanks, Joel
> 
> --
> http://wagerlabs.com/uptick
> 
> 
> 
-- 

  Jouni Rynö                            mailto://Jouni.Ryno@fmi.fi/
                                        http://www.geo.fmi.fi/~ryno/
  Finnish Meteorological Institute      http://www.fmi.fi/
  Space Research                        http://www.geo.fmi.fi/
  P.O.BOX 503                           Tel      (+358)-9-19294656
  FIN-00101 Helsinki                    FAX      (+358)-9-19294603
  Finland                               priv-GSM (+358)-50-5302903
  
  "It's just zeros and ones, it cannot be hard"





More information about the erlang-questions mailing list