[erlang-questions] Distributed application and netsplit

Andrew Stone andrew.j.stone.1@REDACTED
Thu Nov 20 21:09:55 CET 2014


Hi,

I've waited for someone to jump in and say this, but it hasn't happened.
You really, really don't want to try dealing with netsplits and application
failover in an app specific manner. It is not safe, and you will likely
lose data. You really need a consensus algorithm like raft or paxos to
handle this type of thing safely, or else you will end up with conflicting
data on both sides of the partition.

It may be quite a large dependency to rely on, but riak 2.0 has strongly
consistent keys[1] that you could use to build a lock server to point to
the active master server. Alternatively you could use riak_ensemble[2]
directly to build a custom solution.

Lastly, you can simply choose to not use failover and accept that when the
primary goes down you will be offline until it comes back up. The secondary
is just there to provide disaster recovery in case the primary is
irrecoverable. This is a much safer and simpler solution, and one
historically used by conventional databases with both asynchronous and
synchronous replication. If you must have some level of fault tolerance/ HA
you can use paxos. If your application can handle eventual consistency and
the data types fit the model, you could try to use CRDTs [3]. That would
allow you 100% availability and even allow writes to happen to both servers
at once!

I can't stress enough how important it is to not build ad-hoc failover
protocols for this purpose. It will bite you. I've been bitten before, and
so have many other people relying on this mechanism. While it may seem
easier than using a proper distributed systems protocol at first, when you
lose customer data in production, you quickly learn that easy isn't best.

Best wishes,
Andrew

[1] http://docs.basho.com/riak/latest/dev/advanced/strong-consistency/
[2] https://github.com/basho/riak_ensemble
[3] https://en.wikipedia.org/wiki/Conflict-free_replicated_data_types

On Thu, Nov 20, 2014 at 9:56 AM, Mark Nijhof <mark.nijhof@REDACTED
> wrote:

> Thank you!
>
> On Thu, Nov 20, 2014 at 3:51 PM, Imants Cekusins <imantc@REDACTED> wrote:
>
>> the code is in
>>
>> https://github.com/aminishiki/distr_netsplit.git
>>
>> any comments are welcome!
>>
>
>
>
> --
> Mark Nijhof
> t:   @MarkNijhof <https://twitter.com/MarkNijhof>
> s:  marknijhof
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141120/78e6ec5f/attachment.htm>


More information about the erlang-questions mailing list