[erlang-questions] MySQL Cluster

Jay Nelson jay@REDACTED
Sat Oct 21 19:21:40 CEST 2006


Pat e wrote:

 > We are prepairing rather big web project that
 > will end up with tens of TB of data.

Scott wrote:

 > These sizing questions are popping up on a regular basis now, getting
 > to be FAQ frequency.

I'll second that.

 > No I haven't tried MySQL Cluster or TimesTen, but I'm hoping for
 > some free time to try it out.

I work for a Wall Street firm.  We used to have our trading client 
running on Solaris, C++ and X-Windows using Sybase.  Competitors started 
rolling out PC-based solutions and the traders said they didn't want 2 
boxes on their desk.  The whole industry has shifted to PC-based trading 
applications.

We went with the whole Visual Studio C++, Times Ten database, Tibco 
messaging, blah, blah.

When the individual trader databases started approaching 1GB we could no 
longer use Times Ten because they map to a virtual file in RAM.  So we 
started looking at compressing the data.  That bought a few months.  
Then we started looking at Sybase RAM versions which turned out to be 
much faster.

Meanwhile the backend backoffice stuff ran on Solaris with Sybase.  The 
log files started getting too big to deal with.  Then the databases 
started get too big to deal with.  Now the pressure is on to send an 
order to the floor in 5 ms which leaves not only not enough time to 
touch a DB, but not enough time to touch the disk.

We are in the process of eliminating databases.  Why did we use them in 
the first place?   You just do.  If you are an engineer and I say I need 
to store a lot of data and your answer is not "A database!", you fail 
CS101.  No one ever asks, "how do you want to access it, why do you 
think you need a database".

===============================

"My website needs 20TB of data in the database..."

Why?  Are they pictures or movies?  => Do not store them in a database.
Do you have 20 Billion users, each with 1000 bytes of data?  =>  Double 
check your numbers.

Do you have 1M users with backup data of 20MB each?  => Store the 1M 
users in a database, store the rest some other way.

===============================

Pat e writes:

 > Will it be necessary to completley rewrite mnesia for multi-terabyte 
disk-only
tables?

That depends.  Is it possible for you to look at your problem 
differently so that
 it doesn't require multi-terabyte disk-only tables?

===============================

What is the real problem you want to solve?
Why do you think you need a database?
Why is it so big?
How often will it be queried?
How often will it be modified?
Is it really an archive or a transaction log?

Now, if you are monitoring the sales of canned beans across all shopper 
club members in real time to tune your factory production line -- then 
you've got a real problem.

Generally, when you push the limits of technology you look at tweaking 
it.  When you move several orders of magnitude beyond the current state 
of the art, you need to look at the basics of the problem, the 
underpinnings of the technology and think about coming up with a new 
alternative designed for the new problem space.

jay





More information about the erlang-questions mailing list