[erlang-questions] Distributed application and netsplit

Andrew Stone <>
Fri Nov 21 00:10:31 CET 2014

I agree that using stateless app servers with a highly available database
like Riak is probably a simpler and more robust design than trying to build
that logic yourself. If your application can be built that way, it will be
a lot easier.


On Thu, Nov 20, 2014 at 3:53 PM, Michael Truog <> wrote:

>  On 11/20/2014 12:09 PM, Andrew Stone wrote:
>  Hi,
> I've waited for someone to jump in and say this, but it hasn't happened.
> You really, really don't want to try dealing with netsplits and application
> failover in an app specific manner. It is not safe, and you will likely
> lose data. You really need a consensus algorithm like raft or paxos to
> handle this type of thing safely, or else you will end up with conflicting
> data on both sides of the partition.
>  It may be quite a large dependency to rely on, but riak 2.0 has strongly
> consistent keys[1] that you could use to build a lock server to point to
> the active master server. Alternatively you could use riak_ensemble[2]
> directly to build a custom solution.
>  Lastly, you can simply choose to not use failover and accept that when
> the primary goes down you will be offline until it comes back up. The
> secondary is just there to provide disaster recovery in case the primary is
> irrecoverable. This is a much safer and simpler solution, and one
> historically used by conventional databases with both asynchronous and
> synchronous replication. If you must have some level of fault tolerance/ HA
> you can use paxos. If your application can handle eventual consistency and
> the data types fit the model, you could try to use CRDTs [3]. That would
> allow you 100% availability and even allow writes to happen to both servers
> at once!
>  I can't stress enough how important it is to not build ad-hoc failover
> protocols for this purpose. It will bite you. I've been bitten before, and
> so have many other people relying on this mechanism. While it may seem
> easier than using a proper distributed systems protocol at first, when you
> lose customer data in production, you quickly learn that easy isn't best.
>  Best wishes,
> Andrew
>  [1] http://docs.basho.com/riak/latest/dev/advanced/strong-consistency/
>  [2] https://github.com/basho/riak_ensemble
> [3] https://en.wikipedia.org/wiki/Conflict-free_replicated_data_types
> The perspective above (Andrew's post) is good, but it is the perspective
> that you must be trying to maintain global state yourself.  That means you
> are trying to create your own database instead of reusing one of the many
> databases that already exist.  I believe development time is better spent
> reusing the databases that already exist, to handle replication of state as
> necessary and to deal with the latency inherent in that process.
> To pursue lower latency fault-tolerance it is better to have master-less
> processing in the Erlang nodes (not quorum among all instances with
> replicas for failures, but rather separate instances of source code that
> are used separately, concurrently).  Then all the source code execution
> that needs to be fault-tolerant can be replicated to separate nodes, so
> netsplits do not impact the situation (any state the source code uses is
> only temporary, due to relying completely on the database).  If necessary,
> database nodes could share the same machine as the Erlang node hosts, to
> avoid the possibility that a switch failure could cause a netsplit which
> impacts all database connections (assuming the database was one which could
> handle all the failure scenarios).
> This is the approach you can take with pg2 usage (
> http://www.erlang.org/doc/man/pg2.html) or cpg (at
> https://github.com/okeuday/cpg/) to create process groups that are
> distributed.  If you are looking for higher-level abstractions, there is a
> service abstraction provided by CloudI (http://cloudi.org) which relies
> on cpg to keep service processes available on all Erlang nodes, despite any
> netsplits, pursuing this master-less approach.
> On Thu, Nov 20, 2014 at 9:56 AM, Mark Nijhof <
> > wrote:
>> Thank you!
>> On Thu, Nov 20, 2014 at 3:51 PM, Imants Cekusins <>
>> wrote:
>>> the code is in
>>> https://github.com/aminishiki/distr_netsplit.git
>>> any comments are welcome!
>>  --
>>  Mark Nijhof
>>  t:   @MarkNijhof <https://twitter.com/MarkNijhof>
>> s:  marknijhof
>> _______________________________________________
>> erlang-questions mailing list
>> http://erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing ://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141120/bb64db02/attachment.html>

More information about the erlang-questions mailing list