[erlang-questions] Distributed application and netsplit

Felix Gallo <>
Fri Nov 21 18:04:12 CET 2014


With all due respect to the Basho guys and the wars from which their unique
and terrible scars were born, the question here was recovering a single
gen_server from split-brain.

Disregarding for a moment the incongruousness of installing a Riak cluster
to try to handle that case, there's nothing Riak could do to even help out,
even if both progressing gen_servers were able to instantly save state in a
known shared key.  Either it's configured as last-write-wins ("lose data")
or it's configured to return siblings ("split-brain, YOU handle it").

The underlying problem is that you need conflict resolution, which Riak
*does* provide with their recent CRDT work, but that's a function of CRDTs
as a concept and organizing your data properly.  And while I wholeheartedly
agree with the Basho guys that you should go to the literature and proven
algorithms rather than inventing your own, there's a very large class of
gen_servers for which conflicting progress state is trivially composable
(e.g 'union', 'greater_of').

CRDT paper:
http://highscalability.com/blog/2010/12/23/paper-crdts-consistency-without-concurrency-control.html
great discussion of the issues:
http://aphyr.com/posts/285-call-me-maybe-riak


On Thu, Nov 20, 2014 at 10:28 PM, Scott Lystig Fritchie <
> wrote:

> Andrew Stone <> wrote:
>
> ajs> Hi, I've waited for someone to jump in and say this, but it hasn't
> ajs> happened. You really, really don't want to try dealing with
> ajs> netsplits and application failover in an app specific manner. It is
> ajs> not safe, and you will likely lose data.
>
> Agreed, 106%.
>
> As Joe Armstrong, benelovent co-leader of the concurrent world, is fond
> of saying, it takes at least two machines to detect and recover from a
> fault in one.  In the larger case of split brain management in
> distributed, asynchronous, message passing environment (like the
> Universe), there are fundamental limits and impossibilities that nobody
> is immune from.  Good luck with two machines figuring that out.
>
> -Scott
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141121/cd6064db/attachment.html>


More information about the erlang-questions mailing list