Mnesia not suitable for time series storage?
Serge Aleynikov
serge@REDACTED
Wed Jul 13 15:10:39 CEST 2005
I believe the main problem with the speed is that you organized the tick
table to be a bag and also have another BTree index on the date. The
search time related to inserting into this table will grow logarithmically.
Here is an idea to try. Alter table layout in order to improve speed by
the following:
1. Use a separate set table for each symbol (this will partition data
better).
2. Use a surrogate key, like date, or {date, hour}, and store all ticks
related to that key in a form of a list:
-record(tick_data, {
key = {date, hour},
data = [ {minute,second,timestamp,price,size} ]
}).
This is not as convenient as your original implementation, as it will
require some additional logic in retrieving the data, but I believe
it'll improve the speed of inserts. Since for each key data is inserted
in a sorted order, you can use a binary search on the data list for a
given key.
Serge
Joel Reymont wrote:
> Folks,
>
> I'm trying to create a tick database on top of Mnesia. Ticks are price
> quotes (trades in a stock) sent by the exchange a few times a second
> for active securities.
>
> My two requirements for tick storage are very high-speed writes and
> high-speed sequential retrieval.
>
> I basically issue one type of query and that is to give me all the
> ticks in a particular security that happened between this and that time.
>
> An issue that presents a problem is that exchanges might send many
> quotes per second but they do not give time in milliseconds, so I need
> to keep track of the order in which the quotes came in.
>
> My unoptimized tick record looks like this:
>
> -record(tick, {
> symbol,
> datetime,
> timestamp,
> price,
> size
> }).
>
> Symbol is a string (list) of 4 characters, datetime is a bignum formed
> from a date and time 20040901,09:39:38 by stripping the comma and the
> colons. Timestamp is now(), price and size are float and integer
> respectively.
>
> The timestamp field is what keeps track of the order in which price
> quotes come in. I do not know yet how to tell Mnesia to return the
> records sorted by timestamp, at least not how to do it efficiently.
>
> Table is created like this:
>
> mnesia:create_table(tick,
> [
> {disc_copies, Nodes},
> {index, [datetime]},
> {type, bag},
> {attributes, record_info(fields, tick)}])
>
> The table is a bag because ticks are unique only on the combination of
> symbol, datetime AND timestamp. I tried making the table a set but only
> got one record in it.
>
> Now, here's the kicker... I parse through a file of historical quotes
> and insert ticks into Mnesia one by one and time the insert. Inserts
> start at around 1ms and quickly grow to 5, 7, 8, etc. I'm at 24ms per
> insert right now and I killed my previous attempts to load about
> 120,000 after an hour or two.
>
> I would like to be able to store at least 500 ticks per second in
> Mnesia and have good lookup speed as well. Is this possible?
>
> Thanks, Joel
>
> --
> http://wagerlabs.com/uptick
More information about the erlang-questions
mailing list