[erlang-questions] Amazon S3: Now with locking, transactions and caching

Joel Reymont <>
Fri Jul 20 02:42:47 CEST 2007


Folks,

This is a status update on what I've been working for the past few  
weeks.

You are probably well aware that Amazon S3 provides unlimited  
scalability but does not have locking and transactions. There's also  
a delay between the time when data is written to S3 and when it  
becomes available for reading.

I tried several approaches but the best one turned out to be one of  
hacking Mnesia internals to add s3_copies as a table type. I started  
from scratch, as opposed to building up on Ulf Wiger's RDBMS but I  
doubt I could have done it without reading the RDBMS code and asking  
lots of questions, all of which Uffe was kind enough to asnwer.

Hacking Mnesia turned out to be a veritable pain in the rear as I had  
to touch most of the modules, including today's extensive modding  
session with mnesia_loader.erl. I will also need to apply the changes  
to any upcoming releases of Mnesia.

I would say it was worth it, though, as I now can

- lock S3 buckets or "records" using {Bucket, Key}

- update several S3 records in a single transaction

- set up additional s3_copies replicas using mnesia:add_table_copy/3

- ensure that data is only written to S3 once

- have a large cluster of Yaws nodes use a small cluster of "master"  
Mnesia nodes with s3_copies replicas, thus keeping replication and  
transaction costs down.

I also coupled the virtual S3 table with a fixed-size cache that is  
built on top of a regular Mnesia table. All writes go trough the  
cache, which ensures that hot data is available immediately. So long  
as the cache API is used, any cache misses are automatically  
redirected to S3.

	Thanks, Joel

--
http://topdog.cc      - EasyLanguage to C# compiler
http://wagerlabs.com  - Blog








More information about the erlang-questions mailing list