[erlang-questions] kv store as a service

Sat Apr 5 19:02:17 CEST 2014

Gokhan: Thanks for pointing out what I need to clarify.

What kind of data?

I'm basically building a wiki. Each "value" is an svg-document + the
history of the svg document.

Why do you need cache?

S3 pricing comes out to $120 / TB.
DO pricing is $5 / TB.
Thus, I'd prefer DO to read from S3, cache it on DO, then serve from
DO. This saves bandwidth cost by factor of 24. (sorry for not
explaining this earlier)

Why do you believe SSD will suffice for cache?

The site is for a class -- so the notes are "weekly" -- thus, it's
highly likely that the most accessed entries are the most recent
entries (i.e. it's the user requests is not uniformly random over the
keys; but rather heavily weighted in favor of recent keys)

What is your retrieval pattern?

I don't have hard data yet -- I'm still building this.

(preempting a possible future question): Why do you group S3 and Riak
into the same thought?

Eventual consistency doesn't really matter to me here. key = sha hash
of content, thus, "updates = new entry", and I don't worry much about
invalidating cache.

I was thinking purely in terms of k/v store -- and the two most
scalable stores I know of are S3 and Riak.

Please let me know if my thinking appears sloppy anywhere else. (I
have a decent theoretical CS background so I can do the logic/proof --
but this is my first time building a distributed system -- so I may be
asking the wrong questions / not aware of what I'm jumping into).

Thanks!

On Sat, Apr 5, 2014 at 9:02 AM, Gokhan Boranalp <kunthar@REDACTED> wrote:
> Amazon S3 and Riak are different species and actually not directly
> comparable types in the nature of K/V world.
> Question shows that you are not aware of usage types of these two and
> you are not efficiently examined your problem domain by looking
> closely to your data.
>
> Please let us know more about your data types to be used.
> What kind of data really you would like to store?
> Why do you need cache?
> Why do you believe SSD disks could be sufficient for cache operations?
> What is your data access pattern in terms of retrieval of data back?
>
>
> BR
>
> On Fri, Apr 4, 2014 at 3:10 PM, t x <txrev319@REDACTED> wrote:
>> Hi,
>>
>>
>>   This is my current setup:
>>
>>   * a bunch of $5/month digital ocean droplets [1]
>>
>>   * these droplets have a 20GB SSD harddrive
>>
>>   Now, I need to have a gigantic key-value store. I don't want to deal
>> with the error condition of "error: you ran out of disk space"
>>
>>   In my particular design, I only have "create new value". I don't
>> have "update value". Thus, I don't have to worry about invalidating
>> caches, and intend to use the 20GB SSD drives as as "cache" for the
>> real key-value store.
>>
>>
>>
>>   Now, my question is: what should I use for my key-value store?
>>
>>   I want to optimize for:
>>
>>   * minimum cost
>>   * minimum administration
>>
>>   Currently, the best I have is Amazon S3. (I'd prefer to not setup my
>> own Riak cluster + deal with replication + how many servers to run +
>> ... ). I'm okay with the 99.99% (or whatever SLA Amazon S3 provides).
>>
>>
>>
>>   Question: Is S3 the right approach as a giant K-V store for my
>> Erlang nodes to hit, or should I be using something else?
>>
>> Thanks!
>>
>>
>> [1] This is not an advertisement for DO. I do not have any DO equity.
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> --
> BR,
> \|/ Kunthar