Mnesia not suitable for time series storage?

Serge Aleynikov serge@REDACTED
Wed Jul 13 15:10:39 CEST 2005


I believe the main problem with the speed is that you organized the tick 
table to be a bag and also have another BTree index on the date.  The 
search time related to inserting into this table will grow logarithmically.

Here is an idea to try.  Alter table layout in order to improve speed by 
the following:

1. Use a separate set table for each symbol (this will partition data 
better).

2. Use a surrogate key, like date, or {date, hour}, and store all ticks 
related to that key in a form of a list:
-record(tick_data, {
       key  = {date, hour},
       data = [ {minute,second,timestamp,price,size} ]
}).

This is not as convenient as your original implementation, as it will 
require some additional logic in retrieving the data, but I believe 
it'll improve the speed of inserts.  Since for each key data is inserted 
in a sorted order, you can use a binary search on the data list for a 
given key.

Serge

Joel Reymont wrote:
> Folks,
> 
> I'm trying to create a tick database on top of Mnesia. Ticks are  price 
> quotes (trades in a stock) sent by the exchange a few times a  second 
> for active securities.
> 
> My two requirements for tick storage are very high-speed writes and  
> high-speed sequential retrieval.
> 
> I basically issue one type of query and that is to give me all the  
> ticks in a particular security that happened between this and that time.
> 
> An issue that presents a problem is that exchanges might send many  
> quotes per second but they do not give time in milliseconds, so I  need 
> to keep track of the order in which the quotes came in.
> 
> My unoptimized tick record looks like this:
> 
> -record(tick, {
>       symbol,
>       datetime,
>       timestamp,
>       price,
>       size
>      }).
> 
> Symbol is a string (list) of 4 characters, datetime is a bignum  formed 
> from a date and time 20040901,09:39:38 by stripping the comma  and the 
> colons. Timestamp is now(), price and size are float and  integer 
> respectively.
> 
> The timestamp field is what keeps track of the order in which price  
> quotes come in. I do not know yet how to tell Mnesia to return the  
> records sorted by timestamp, at least not how to do it efficiently.
> 
> Table is created like this:
> 
> mnesia:create_table(tick,
>                  [
>                   {disc_copies, Nodes},
>                   {index, [datetime]},
>                   {type, bag},
>                   {attributes, record_info(fields, tick)}])
> 
> The table is a bag because ticks are unique only on the combination  of 
> symbol, datetime AND timestamp. I tried making the table a set but  only 
> got one record in it.
> 
> Now, here's the kicker... I parse through a file of historical quotes  
> and insert ticks into Mnesia one by one and time the insert. Inserts  
> start at around 1ms and quickly grow to 5, 7, 8, etc. I'm at 24ms per  
> insert right now and I killed my previous attempts to load about  
> 120,000 after an hour or two.
> 
> I would like to be able to store at least 500 ticks per second in  
> Mnesia and have good lookup speed as well. Is this possible?
> 
>     Thanks, Joel
> 
> -- 
> http://wagerlabs.com/uptick




More information about the erlang-questions mailing list