[erlang-questions] kv store as a service

Sat Apr 5 19:02:17 CEST 2014

Gokhan: Thanks for pointing out what I need to clarify.

What kind of data?

I'm basically building a wiki. Each "value" is an svg-document + the
history of the svg document.

Why do you need cache?

S3 pricing comes out to $120 / TB.
DO pricing is $5 / TB.
Thus, I'd prefer DO to read from S3, cache it on DO, then serve from
DO. This saves bandwidth cost by factor of 24. (sorry for not
explaining this earlier)

Why do you believe SSD will suffice for cache?

The site is for a class -- so the notes are "weekly" -- thus, it's
highly likely that the most accessed entries are the most recent
entries (i.e. it's the user requests is not uniformly random over the
keys; but rather heavily weighted in favor of recent keys)

What is your retrieval pattern?

I don't have hard data yet -- I'm still building this.

(preempting a possible future question): Why do you group S3 and Riak
into the same thought?

Eventual consistency doesn't really matter to me here. key = sha hash
of content, thus, "updates = new entry", and I don't worry much about
invalidating cache.

I was thinking purely in terms of k/v store -- and the two most
scalable stores I know of are S3 and Riak.

Please let me know if my thinking appears sloppy anywhere else. (I
have a decent theoretical CS background so I can do the logic/proof --
but this is my first time building a distributed system -- so I may be
asking the wrong questions / not aware of what I'm jumping into).


On Sat, Apr 5, 2014 at 9:02 AM, Gokhan Boranalp <> wrote:
