Joe Armstrong (AL/EAB) <>
Mon Jul 11 09:22:03 CEST 2005

```Can I add a few comments?

- two problems have been discussed

1) Storing *lots* of data
2) Processing *lot* of data

1)The implicit assumption seems to be that the data set is
small enough that with some kind of trickery it will fit onto a small number of machines.

Why not ask the question "How can we store all tick data from all markets forever"

Answering this question (or even making progress) seems much more interesting
than fitting a "small" data set onto a system.

However you look at it compression will in practice give you a constant
store factor gain.  Suppose you can store 10 times as much data using compression than
without, and doing so allows you to store data for one year from one market.

How do you store data from 500 markets for 1000 years?

IMHO If you make a platform that "stores all data forever" then you have something worth
selling :-) - even partial success is interesting.

Now how do you store all data forever? - I don't know - but we can guess that
distributed hash tables (chord, pastry, DKS, ...) etc will be involved.

Now essentially all these (chord, pastry, ...) are the same - (ie all involve
a "distance" metric, and to find a key you try to move "closer" to the
key in each step) all of these result in log N lookup likes. Log N will eventually
beat all compression schememes :-)

So as regards storage - make a storage scheme that can store "everything" (or at least
a lot of data)

2) Speed. " go parallel young man "

Yes, of course C beats Erlang - but in Erlang you can *easily* write parallel programs
which you can't in C.

If C is 10 times faster than Erlang then use 11 machines - or a hundred - or a thousand.

Making distributed parallel algorithms is how to get speed - not chasing speed on
one machine.

----

So what is the platform/architecture for 1 + 2?

My best guess is:

A peer 2 peer system - running on a cluster (why not and open system? -
because of "insoluble" security problems) with:

1) Redundant distributed hash tables
2) Parallised computations

This is "slightly beyond the state of the art" - various simulations of (say) chord rings show that they can break after a very long time (parameters like the rate
at which you can repaired a broken data set, and the rate of machine arrivals and
failures are important)

/Joe

> -----Original Message-----
> From:
> [mailto:]On Behalf Of HP Wei
> Sent: den 7 juli 2005 23:18
> To: Joel Reymont
> Cc: Erlang Users' List
> Subject: Re: New trading systems platform
>
>
>
> hi Joel,
>
> > >   I just want to point out that to store tick data
> > >   of many years needs a LOT of disk space plus
> > >   the capability of compression (when writing)
> >
> > Do you have a model for tick data storage? Fields, etc.
>
>    All I am allowed to say is that we have a proprietary model.
>
> > Why would I bother to compress/decompress? Wouldn't keeping 1-2-3Gb
> > in-memory be enough? Should I really care about disk storage?
>
>    hmmm,  I think this depends on the operation
>
>    Usually, for real-time trading, you only need data as back as
>    only a few months to get your initial parameters. If you don't
>    access them very often during the trading, then IO is not
>    an issue.
>    And putting all today's data in memory is certainly ok
>
>    However, when you do backtesting (or model studying),  I believe
>    you will need several years of
>    tick data, then you are likely gonna hit the issues of
>    disk space and the efficiency of database IO.
>
> --HP
>
>

```