[erlang-questions] Choice in Distributed Databases for a Key/Value Store

Tue Sep 19 10:13:39 CEST 2017

O.T.

Hehe, that Aerospike review mentions that they claim 100% uptime (with
seemingly undisclosed precision).

«This makes Aerospike’s uptime infinitely better than the Ericsson AXD301
switch, which delivered somewhere between five and nine nines of
availability in a system comprised of 1.1 million lines of Erlang.»

(Never mind that the claim that 100% would be "infinitely better" than e.g.
99.999% is ludicrous in itself. I assume it was made tongue-in-cheek.)

Of course, this is ridiculous, as also the in-depth review demonstrates.
Uptime figures are only useful in context, and comparing uptime claims of
systems with different purposes is not particularly meaningful. For
example, for the AXD 301, disturbance would be registered as downtime if
one network interface (of potentially hundreds) was unable to process calls
for 15 seconds (which is, BTW, what happened in the "9 nines" case: a
restart of a single device board.) Is a database cluster still "up" if it
stays responsive but has lost your data?

When claiming availability, you have to be very specific.

The article also mentions cluster sizes of "between 1 and 100 nodes".
Guaranteeing 100% uptime in a 1-node 'cluster' is simply not possible.

Anyway, that article is two years old. Today, Aerospike claims
"demonstrated uptime of five 9s". Still very good, but I guess "infinitely
worse" than it used to be. ;-) (
http://www.aerospike.com/benefits/high-availability/)

BR,
Ulf W

PS This is not to defend the "9 nines" claim, which was never officially
made by Ericsson. It was made in a press release by British Telecom.
Ericsson doesn't divulge the actual uptime figures of its systems, but at
least at one time it was ok to claim publicly that the average recorded
field uptime of AXD 301 systems was "better than 5 nines".

2017-09-19 1:00 GMT+02:00 Paul Oliver <puzza007@REDACTED>:

> I'd recommend checking jepsen.io for testing of distributed systems.
> There's a very thorough review of Aerospike there with some results that
> may give you pause. https://aphyr.com/posts/324-jepsen-aerospike
>
> On Tue, Sep 19, 2017 at 4:54 AM Heinz N. Gies <heinz@REDACTED> wrote:
>
>> I would not give too much on those ‘benchmarks’, they’re highly bogus and
>> that’s if you’re treating them kindly.
>>
>> For a starter it uses default settings and they are not even provided.
>> Redis is a in memory store by default, is it even saving the data? How are
>> risk or Cassandra set up, unlike mongo or redis the others those are build
>> to be clustered, are the default configs used for them disabling unless
>> overhead? Does it mean risk, that is storing every write on disks, perhaps
>> 3 times, is only 10x slower compared to a database that never writes to
>> disk and only keeps one copy?
>>
>> For you own sanity, print that benchmark, find a burn proof area (safety
>> matters!) and set it on fire then move on and benchmark for yourself with a
>> real use case and sensible data.
>>
>>
>> On 18. Sep 2017, at 17:34, code wiget <codewiget95@REDACTED> wrote:
>>
>> HI,
>>
>> Thank you all for your replies.
>>
>> Nathaniel: The reads must be 'eventually' consistent, at least within a
>> second. The problem is that it updates user connection information, and
>> they will be unable to connect if our read does not get information from
>> the write. So if we update, the connection before the write is fully
>> committed will fail. I suppose it is ok if they cannot connect and just
>> have to reconnect, but ideally they should be able to connect every time.
>>
>> So Riak seems like a great solution, but speed wise really worries me. We
>> are trying to connect as many clients as possible per server, this is very
>> important as it saves us money. If the reads take 2-3x as long, this could
>> be very slow and bad. According to this article: https://github.com/
>> citrusbyte/redis-comparison, Riak is up to 10x slower than Redis. This
>> would really hurt our operations.
>>
>> To those who commented redis-cluster, my problem with a cluster solution
>> is that redis-cluster seemed to be in an experimental stage. It also has
>> the problem where if all copies of a node die, then the cluster will lose
>> all that data and it is up to the user to not lose that data. All of this
>> has to be handled by the user, and this seems like it will get tedious when
>> there are multiple nodes and all it would take is for one admin to mess it
>> up.
>>
>> So this is where Aerospike comes in. Reading about them on the web they
>> come off as the perfect tool for a version of redis that is distributed:
>> https://stackoverflow.com/questions/24482337/how-is-aerospike-
>> different-from-other-key-value-nosql-databases . But for some reason,
>> they don’t get as much attention as redis
>>
>> Does anyone have experience with Aerospike? For my application, it seems
>> like a no brainer.
>>
>> Thank you all again,
>>
>> On Sep 15, 2017, at 2:02 PM, Nathaniel Waisbrot <nathaniel@REDACTED>
>> wrote:
>>
>> Scatter-shot reply:
>>
>> Since you're using Redis right now, have you considered Redis Cluster (
>> https://redis.io/topics/cluster-tutorial)?
>>
>> I'm using Cassandra and don't feel that it's got a small community or
>> slow pace of updates. There are a lot of NoSQL databases and they all have
>> quite different tradeoffs which tends to fragment the community, so your
>> expectations may be too high.
>>
>> Riak, ElasticSearch, EtcD, MongoDB, etc. You have many (too many!)
>> options. When you say "read speed and consistency" what sort of consistency
>> are you looking for? Is eventual consistency good, or do you require that
>> every read that takes place after a write gets the new data?
>>
>>
>>
>>
>> On Sep 15, 2017, at 12:43 PM, code wiget <codewiget95@REDACTED> wrote:
>>
>> Hello everyone,
>>
>> I am at the point where I have many Erlang nodes, and I am going to have
>> to move to a distributed database. Right now, I am using a basic setup:
>> each Erlang node has a copy of the same Redis DB, and all of those DBs are
>> slaves(non-writable copies) of a master. A big problem with this is obvious
>> - If the db goes down, the node goes down. If the master goes down, the
>> slaves won’t get updated, so I would like to move to a distributed db that
>> all of my nodes can read/write to that can not/does not go down.
>>
>> The nodes do ~50 reads per write, and are constantly reading, so read
>> speed and consistency is my real concern. I believe this will be the node’s
>> main speed factor.
>>
>> Another thing is that all of my data is key/key/value , so it would mimic
>> the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL
>> DB.
>>
>> A big thing also is that I don’t need disc copies, as a I have a large
>> backup store where the values are generated from.
>>
>> I have looked at as many options as I can ->
>>
>> Voldemort : http://project-voldemort.com/
>> - looks perfect, but there are 0 resources on learning how to use it
>> outside of their docs and no Erlang driver, which is huge because I would
>> both have to learn how to write a c driver and everything about this just
>> to get it to work.
>>
>> Cassandra: http://cassandra.apache.org/
>> - looks good too, but apparently there is a small community and
>> apparently isn’t updated often
>>
>> Scalaris: https://github.com/scalaris-team/scalaris/blob/
>> master/user-dev-guide/main.pdf
>> - Looks very very cool, seems great, but there is 0 active community and
>> their GitHub isn’t updated often. This is a distributed all in-memory
>> database, written in Erlang.
>>
>>
>> So from my research, which consisted heavily of this blog:
>> https://www.metabrew.com/article/anti-rdbms-a-list-of-
>> distributed-key-value-stores , I have narrowed it down to these three.
>>
>> BUT you are all the real experts and have built huge applications in
>> Erlang, what do you use? What do you have experience in that performs well
>> with Erlang nodes spread across multiple machines and possibly multiple
>> data centers?
>>
>> Thanks for your time.
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170919/b7b08811/attachment.htm>