[erlang-questions] Choice in Distributed Databases for a Key/Value Store

Fri Sep 15 19:37:36 CEST 2017

I have some experience with Cassandra, and based on your description of
your needs it sounds like Cassandra would be overkill. I think you could
make it work, but it may not be worth the effort.

Cassandra uses a ring architecture similar to Riak. Reads and writes happen
on multiple nodes and the entire transaction doesn't succeed until it has
succeeded on a certain number of nodes. For your read-heavy workload you
would probably want to set that value higher for writes and lower for
reads. For example, you might say that a write happens on 3 nodes and
succeeds when it is successful on all of those nodes. You might feel safe
then saying that a read only needs to succeed on one node to be successful.
This is something you would definitely want to test thoroughly so that you
understand the performance tradeoffs of changing those values.

Also, recent versions of Cassandra have adopted a table model and query
language (CQL) that are superficially similar to RDBMS tables and SQL, but
are actually completely different. This led to a lot of cognitive
dissonance for my team as we would do things that made sense for an RDBMS,
and that we could express in CQL, but were totally the wrong thing to do
for Cassandra's architecture.

Personally, I would look at Riak before Cassandra if you think that the
ring architecture makes sense for you. Because it doesn't have the
trappings of tables and a SQL-like language, we found it much more
straightforward to reason about the strengths and limitations of the
system. It is very much a straightforward key/value data store.

However, you mentioned read consistency being important, and Cassandra and
Riak both trade off read consistency for availability. They are "eventually
consistent" systems (https://en.wikipedia.org/wiki/Eventual_consistency). I
suggest you read up on the CAP theorem (
https://en.wikipedia.org/wiki/CAP_theorem) and decide what tradeoffs you
are willing to make before choosing a database.

Good luck!

~phil

On September 15, 2017 at 11:43:52 AM, code wiget (codewiget95@REDACTED)
wrote:

Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to
move to a distributed database. Right now, I am using a basic setup: each
Erlang node has a copy of the same Redis DB, and all of those DBs are
slaves(non-writable copies) of a master. A big problem with this is obvious
- If the db goes down, the node goes down. If the master goes down, the
slaves won’t get updated, so I would like to move to a distributed db that
all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed
and consistency is my real concern. I believe this will be the node’s main
speed factor.

Another thing is that all of my data is key/key/value , so it would mimic
the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL
DB.

A big thing also is that I don’t need disc copies, as a I have a large
backup store where the values are generated from.

I have looked at as many options as I can ->

Voldemort : http://project-voldemort.com/
- looks perfect, but there are 0 resources on learning how to use it
outside of their docs and no Erlang driver, which is huge because I would
both have to learn how to write a c driver and everything about this just
to get it to work.

Cassandra: http://cassandra.apache.org/
- looks good too, but apparently there is a small community and apparently
isn’t updated often

Scalaris:
https://github.com/scalaris-team/scalaris/blob/master/user-dev-guide/main.pdf
- Looks very very cool, seems great, but there is 0 active community and
their GitHub isn’t updated often. This is a distributed all in-memory
database, written in Erlang.

So from my research, which consisted heavily of this blog:
https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores
,
I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in
Erlang, what do you use? What do you have experience in that performs well
with Erlang nodes spread across multiple machines and possibly multiple
data centers?

Thanks for your time.

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170915/4ae4e372/attachment.htm>