[erlang-questions] link: Michael Stonebraker, "The End of an Architectural Era (It’s Time for a Complete Rewrite)"

Thu Nov 1 20:05:02 CET 2007

Partly because of the Berkeley DB + Erlang thread, I've read this paper:
"The End of an Architectural Era (It’s Time for a Complete Rewrite)"  
by Michael Stonebraker et al.
Stonebraker was the architect for Ingres, PostgreSQL and (I believe)  
Vertica, so he has credibility.
http://web.mit.edu/dna/www/vldb07hstore.pdf

Here's my overview. The quotes are impressive (82x faster), but I've  
written these comments to give a summary picture.

Summary:
- There are several problem areas where RDBMS's are used, but are not  
a good solution, but everyone assumed OLTP is their sweet spot.
   This paper sets out to compare an experimental database (H-Store)  
which indicates OLTP is not the RDBMS sweet spot either.
   Their premise is: RDBMS are based on 25+ old architectures whose  
requirements and assumptions no longer apply.
- They take a very different approach, exploiting modern hardware and  
software, and willingly examining non-traditional techniques
    (it reads a bit like a 'Hennesy & Patterson' approach - i.e.  
smart solution using proper analysis of key factors, unconstrained by  
'convention')
- The experiment is based on the TPC-C (order processing) OLTP benchmark

Key results:
  - H-Store is 82x faster than a commercial DB at TPC-C on a dual  
core desktop
  - Hence H-Store is within 50% of the world-record (set by a 128  
core SMP server) using only this 2-core desktop

- H-Store is their experimental database, intended to prove the disk- 
based RDBMS's can be improved on by 1-2 orders of magnitude.
   The H-Store database system has significant pieces missing or  
undemonstrated, but the paper gives views on the research directions.

- An H-Store is partitioned to be a 'share-nothing' system, with  
memory partitioned on a per-core basis on each machine, each  
partition of core+memory is a 'site'. A system is intended to be a  
grid of 'sites' across many machines. Data is distributed for  
performance, HA & scalability

- H-Stores major themes:
     - horizontal scalability through share-nothing application of  
cores, memory and machines,
     - significant software simplification: avoiding concurrency  
control, multi-threading, locking, and undo logs,
       and replaces with in-memory, *single threaded*, lock-free  
transactions which can be efficiently executed in parallel across  
many 'sites',
     - huge performance improvements by fully using memory and  
minimising disk use.
     - High Availability by replication built in from the start, and  
not 'bolted on' later
     - no ad-hoc queries

- There are no ad-hoc query transactions, all queries must be 'pre- 
submitted' and compiled.
    The paper claims that most commercial OLTP uses of RDBMS avoid ad  
hoc queries anyway, so this isn't a serious problem. (I largely agree).
- Properties of queries are identified in 'compilation' which allow  
the system to execute the transaction most efficiently.
   These properties also help identify how the database may be  
partitioned across the share-nothing sites.
- They don't see most database need to be fully relational.

- They implement a major subset of the TPC-C benchmark on H-Store,  
but I couldn't work out how much difference the omissions would have.
- The experiment uses a single dual core machine, and compares with a  
commercial database implementation using stored procedures. (There is  
no multi-machine experiment, and this may be because they haven't got  
the multi-machine stuff working - I don't know.)

--------------
QUOTES:
"Both DBMSs were run on a dual-core 2.8GHz CPU computer system, with  
4 Gbytes of main memory and four 250 GB SATA disk drives. ... "

"On this configuration, H-Store ran 70,416 TPC-C transactions per  
second.
  In contrast, we could only coax 850 transactions per second from  
the commercial system, in spite of several days of tuning by a  
professional DBA, who specializes in this vendor’s product.  Hence, H- 
Store ran a factor of 82 faster (almost two orders of magnitude). "

"Finally, though we did not implement all of the TPC-C specification  
(we did not, for example, model wait times), it is also instructive  
to compare our partial TPC-C implementation with TPC-C performance  
records on the TPC website2.
  The highest performing TPC-C implementation executes about 4  
million new-order transactions per minute, or a total of about  
133,000 total transactions per second.  This is on a 128 core shared  
memory machine, so this implementation is getting about 1000  
transactions per core.  Contrast this with 425 transactions per core  
in our benchmark on a commercial system on a (rather pokey) desktop  
machine, or 35,000 transactions per core in H- Store! Also, note that  
H-Store is within a factor of two of the best TPC-C results on a low- 
end machine. "

"In summary, the conclusion to be reached is that nearly two orders  
of magnitude in performance improvement are available to a system  
designed along the lines of H-Store."

--------------

There are several weaknesses of the experiment, but H-Store *is* very  
quick so I wont whinge too much.
I think the TPC-C "new-order" transaction *should* work very well on  
their architecture (and I'd hope it's the major transaction in the  
mix :-), but I'd like to understand what variation (i.e. query) might  
push H-Store out of its sweet spot.

One of their ideas is to use Ruby on Rails as the programming  
language embedded in it!
I wondered if anyone at Ericsson might like to encourage the folks at  
MIT to combine H-Store (which is C++) with Erlang, then .... ;-)

Anyway, it is a fun paper, and an interesting follow up to their end  
of "One Size Fits All" (data warehouse stuff):
http://www.databasecolumn.com/2007/09/one-size-fits-all.html
http://www.cs.brown.edu/~ugur/fits_all.pdf - “One Size Fits All”: An  
Idea Whose Time Has Come and Gone
http://nms.csail.mit.edu/~stavros/pubs/osfa.pdf - One Size Fits All?  
– Part 2: Benchmarking Results

I hope this is interesting/useful.
GB

PS - See y'all next week at EUP