[erlang-questions] kv store as a service

Chris Molozian <>
Mon Apr 7 01:14:09 CEST 2014


Hi,

Based on your requirements:
- unlimited disk space
- each value is an SVG document (and its history)
- minimum cost
- minimum administration

I wonder if you might be interested in a service like Orchestrate (http://orchestrate.io/docs/). I’m an engineer at Orchestrate, we’re big fans of Erlang as most of us worked on Riak before building this service.

I think Orchestrate could be a good fit because:
- the pricing model is based on MOps (per million requests) with a free tier of 1MOp/month.
- we do not charge for disk usage.
- all data in the service is immutable, with “ref” history (you can retrieve previous versions of a KV object)
- there is no administration overhead, we’re the infrastructure team for you

Normally I'm wary of promoting a company (when working for them) on a community mailing list, but in this case I really do think we could be a good fit.

Hope this helps.

Kind Regards,

Chris

-- 
Chris Molozian
Software Engineer

Sent with Airmail

On 5 April 2014 at 18:02:55, t x () wrote:

I should also add, for this particular case, HDFS as a service would  
work for me too -- all what I really need is just a k/v store.  

On Sat, Apr 5, 2014 at 10:02 AM, t x <> wrote:  
> Gokhan: Thanks for pointing out what I need to clarify.  
>  
>  
> What kind of data?  
>  
> I'm basically building a wiki. Each "value" is an svg-document + the  
> history of the svg document.  
>  
>  
> Why do you need cache?  
>  
> S3 pricing comes out to $120 / TB.  
> DO pricing is $5 / TB.  
> Thus, I'd prefer DO to read from S3, cache it on DO, then serve from  
> DO. This saves bandwidth cost by factor of 24. (sorry for not  
> explaining this earlier)  
>  
>  
> Why do you believe SSD will suffice for cache?  
>  
> The site is for a class -- so the notes are "weekly" -- thus, it's  
> highly likely that the most accessed entries are the most recent  
> entries (i.e. it's the user requests is not uniformly random over the  
> keys; but rather heavily weighted in favor of recent keys)  
>  
>  
> What is your retrieval pattern?  
>  
> I don't have hard data yet -- I'm still building this.  
>  
>  
>  
> (preempting a possible future question): Why do you group S3 and Riak  
> into the same thought?  
>  
> Eventual consistency doesn't really matter to me here. key = sha hash  
> of content, thus, "updates = new entry", and I don't worry much about  
> invalidating cache.  
>  
> I was thinking purely in terms of k/v store -- and the two most  
> scalable stores I know of are S3 and Riak.  
>  
>  
> Please let me know if my thinking appears sloppy anywhere else. (I  
> have a decent theoretical CS background so I can do the logic/proof --  
> but this is my first time building a distributed system -- so I may be  
> asking the wrong questions / not aware of what I'm jumping into).  
>  
>  
> Thanks!  
>  
>  
>  
> On Sat, Apr 5, 2014 at 9:02 AM, Gokhan Boranalp <> wrote:  
>> Amazon S3 and Riak are different species and actually not directly  
>> comparable types in the nature of K/V world.  
>> Question shows that you are not aware of usage types of these two and  
>> you are not efficiently examined your problem domain by looking  
>> closely to your data.  
>>  
>> Please let us know more about your data types to be used.  
>> What kind of data really you would like to store?  
>> Why do you need cache?  
>> Why do you believe SSD disks could be sufficient for cache operations?  
>> What is your data access pattern in terms of retrieval of data back?  
>>  
>>  
>> BR  
>>  
>> On Fri, Apr 4, 2014 at 3:10 PM, t x <> wrote:  
>>> Hi,  
>>>  
>>>  
>>> This is my current setup:  
>>>  
>>> * a bunch of $5/month digital ocean droplets [1]  
>>>  
>>> * these droplets have a 20GB SSD harddrive  
>>>  
>>> Now, I need to have a gigantic key-value store. I don't want to deal  
>>> with the error condition of "error: you ran out of disk space"  
>>>  
>>> In my particular design, I only have "create new value". I don't  
>>> have "update value". Thus, I don't have to worry about invalidating  
>>> caches, and intend to use the 20GB SSD drives as as "cache" for the  
>>> real key-value store.  
>>>  
>>>  
>>>  
>>> Now, my question is: what should I use for my key-value store?  
>>>  
>>> I want to optimize for:  
>>>  
>>> * minimum cost  
>>> * minimum administration  
>>>  
>>> Currently, the best I have is Amazon S3. (I'd prefer to not setup my  
>>> own Riak cluster + deal with replication + how many servers to run +  
>>> ... ). I'm okay with the 99.99% (or whatever SLA Amazon S3 provides).  
>>>  
>>>  
>>>  
>>> Question: Is S3 the right approach as a giant K-V store for my  
>>> Erlang nodes to hit, or should I be using something else?  
>>>  
>>> Thanks!  
>>>  
>>>  
>>> [1] This is not an advertisement for DO. I do not have any DO equity.  
>>> _______________________________________________  
>>> erlang-questions mailing list  
>>>   
>>> http://erlang.org/mailman/listinfo/erlang-questions  
>>  
>>  
>>  
>> --  
>> BR,  
>> \|/ Kunthar  
_______________________________________________  
erlang-questions mailing list  
  
http://erlang.org/mailman/listinfo/erlang-questions  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140407/c06eef72/attachment.html>


More information about the erlang-questions mailing list