[erlang-questions] Choice in Distributed Databases for a Key/Value Store

Mon Sep 18 18:54:30 CEST 2017

I would not give too much on those ‘benchmarks’, they’re highly bogus and that’s if you’re treating them kindly.

For a starter it uses default settings and they are not even provided. Redis is a in memory store by default, is it even saving the data? How are risk or Cassandra set up, unlike mongo or redis the others those are build to be clustered, are the default configs used for them disabling unless overhead? Does it mean risk, that is storing every write on disks, perhaps 3 times, is only 10x slower compared to a database that never writes to disk and only keeps one copy?

For you own sanity, print that benchmark, find a burn proof area (safety matters!) and set it on fire then move on and benchmark for yourself with a real use case and sensible data.

> On 18. Sep 2017, at 17:34, code wiget <codewiget95@REDACTED> wrote:
> 
> HI,
> 
> Thank you all for your replies.
> 
> Nathaniel: The reads must be 'eventually' consistent, at least within a second. The problem is that it updates user connection information, and they will be unable to connect if our read does not get information from the write. So if we update, the connection before the write is fully committed will fail. I suppose it is ok if they cannot connect and just have to reconnect, but ideally they should be able to connect every time.
> 
> So Riak seems like a great solution, but speed wise really worries me. We are trying to connect as many clients as possible per server, this is very important as it saves us money. If the reads take 2-3x as long, this could be very slow and bad. According to this article: https://github.com/citrusbyte/redis-comparison <https://github.com/citrusbyte/redis-comparison>, Riak is up to 10x slower than Redis. This would really hurt our operations.
> 
> To those who commented redis-cluster, my problem with a cluster solution is that redis-cluster seemed to be in an experimental stage. It also has the problem where if all copies of a node die, then the cluster will lose all that data and it is up to the user to not lose that data. All of this has to be handled by the user, and this seems like it will get tedious when there are multiple nodes and all it would take is for one admin to mess it up.
> 
> So this is where Aerospike comes in. Reading about them on the web they come off as the perfect tool for a version of redis that is distributed: https://stackoverflow.com/questions/24482337/how-is-aerospike-different-from-other-key-value-nosql-databases <https://stackoverflow.com/questions/24482337/how-is-aerospike-different-from-other-key-value-nosql-databases> . But for some reason, they don’t get as much attention as redis
> 
> Does anyone have experience with Aerospike? For my application, it seems like a no brainer.
> 
> Thank you all again,
>> On Sep 15, 2017, at 2:02 PM, Nathaniel Waisbrot <nathaniel@REDACTED <mailto:nathaniel@REDACTED>> wrote:
>> 
>> Scatter-shot reply:
>> 
>> Since you're using Redis right now, have you considered Redis Cluster (https://redis.io/topics/cluster-tutorial <https://redis.io/topics/cluster-tutorial>)?
>> 
>> I'm using Cassandra and don't feel that it's got a small community or slow pace of updates. There are a lot of NoSQL databases and they all have quite different tradeoffs which tends to fragment the community, so your expectations may be too high.
>> 
>> Riak, ElasticSearch, EtcD, MongoDB, etc. You have many (too many!) options. When you say "read speed and consistency" what sort of consistency are you looking for? Is eventual consistency good, or do you require that every read that takes place after a write gets the new data?
>> 
>> 
>> 
>> 
>>> On Sep 15, 2017, at 12:43 PM, code wiget <codewiget95@REDACTED <mailto:codewiget95@REDACTED>> wrote:
>>> 
>>> Hello everyone,
>>> 
>>> I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.
>>> 
>>> The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.
>>> 
>>> Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.
>>> 
>>> A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.
>>> 
>>> I have looked at as many options as I can ->
>>> 
>>> Voldemort : http://project-voldemort.com/ <http://project-voldemort.com/>
>>> - looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work.
>>> 
>>> Cassandra: http://cassandra.apache.org/ <http://cassandra.apache.org/>
>>> - looks good too, but apparently there is a small community and apparently isn’t updated often
>>> 
>>> Scalaris: https://github.com/scalaris-team/scalaris/blob/master/user-dev-guide/main.pdf <https://github.com/scalaris-team/scalaris/blob/master/user-dev-guide/main.pdf>
>>> - Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.
>>> 
>>> 
>>> So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores <https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores> , I have narrowed it down to these three.
>>> 
>>> BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?
>>> 
>>> Thanks for your time.
>>> 
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>
>> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170918/a805d3c7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170918/a805d3c7/attachment.bin>