[erlang-questions] kv store as a service

Sat Apr 5 19:02:45 CEST 2014

I should also add, for this particular case, HDFS as a service would
work for me too -- all what I really need is just a k/v store.

On Sat, Apr 5, 2014 at 10:02 AM, t x <txrev319@REDACTED> wrote:
> Gokhan: Thanks for pointing out what I need to clarify.
>
>
> What kind of data?
>
> I'm basically building a wiki. Each "value" is an svg-document + the
> history of the svg document.
>
>
> Why do you need cache?
>
> S3 pricing comes out to $120 / TB.
> DO pricing is $5 / TB.
> Thus, I'd prefer DO to read from S3, cache it on DO, then serve from
> DO. This saves bandwidth cost by factor of 24. (sorry for not
> explaining this earlier)
>
>
> Why do you believe SSD will suffice for cache?
>
> The site is for a class -- so the notes are "weekly" -- thus, it's
> highly likely that the most accessed entries are the most recent
> entries (i.e. it's the user requests is not uniformly random over the
> keys; but rather heavily weighted in favor of recent keys)
>
>
> What is your retrieval pattern?
>
> I don't have hard data yet -- I'm still building this.
>
>
>
> (preempting a possible future question): Why do you group S3 and Riak
> into the same thought?
>
> Eventual consistency doesn't really matter to me here. key = sha hash
> of content, thus, "updates = new entry", and I don't worry much about
> invalidating cache.
>
> I was thinking purely in terms of k/v store -- and the two most
> scalable stores I know of are S3 and Riak.
>
>
> Please let me know if my thinking appears sloppy anywhere else. (I
> have a decent theoretical CS background so I can do the logic/proof --
> but this is my first time building a distributed system -- so I may be
> asking the wrong questions / not aware of what I'm jumping into).
>
>
> Thanks!
>
>
>
> On Sat, Apr 5, 2014 at 9:02 AM, Gokhan Boranalp <kunthar@REDACTED> wrote:
>> Amazon S3 and Riak are different species and actually not directly
>> comparable types in the nature of K/V world.
>> Question shows that you are not aware of usage types of these two and
>> you are not efficiently examined your problem domain by looking
>> closely to your data.
>>
>> Please let us know more about your data types to be used.
>> What kind of data really you would like to store?
>> Why do you need cache?
>> Why do you believe SSD disks could be sufficient for cache operations?
>> What is your data access pattern in terms of retrieval of data back?
>>
>>
>> BR
>>
>> On Fri, Apr 4, 2014 at 3:10 PM, t x <txrev319@REDACTED> wrote:
>>> Hi,
>>>
>>>
>>>   This is my current setup:
>>>
>>>   * a bunch of $5/month digital ocean droplets [1]
>>>
>>>   * these droplets have a 20GB SSD harddrive
>>>
>>>   Now, I need to have a gigantic key-value store. I don't want to deal
>>> with the error condition of "error: you ran out of disk space"
>>>
>>>   In my particular design, I only have "create new value". I don't
>>> have "update value". Thus, I don't have to worry about invalidating
>>> caches, and intend to use the 20GB SSD drives as as "cache" for the
>>> real key-value store.
>>>
>>>
>>>
>>>   Now, my question is: what should I use for my key-value store?
>>>
>>>   I want to optimize for:
>>>
>>>   * minimum cost
>>>   * minimum administration
>>>
>>>   Currently, the best I have is Amazon S3. (I'd prefer to not setup my
>>> own Riak cluster + deal with replication + how many servers to run +
>>> ... ). I'm okay with the 99.99% (or whatever SLA Amazon S3 provides).
>>>
>>>
>>>
>>>   Question: Is S3 the right approach as a giant K-V store for my
>>> Erlang nodes to hit, or should I be using something else?
>>>
>>> Thanks!
>>>
>>>
>>> [1] This is not an advertisement for DO. I do not have any DO equity.
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>> --
>> BR,
>> \|/ Kunthar