[erlang-questions] Help with design of distributed fault-tolerant systems

Thu Oct 8 09:23:52 CEST 2015

On Thu, Oct 8, 2015 at 3:53 AM, Martin Karlsson
<martin@REDACTED> wrote:
> I struggle a lot with how to design erlang systems. Everything is easy and
> very powerful as long as you stay on one node. Supervision tree, and
> processes and all that.
>
> However, to be fault tolerant you need at least three servers and here is
> where my problem comes in. All of a sudden the nice design is not so nice
> any longer.
>
> gen_server is all about state. And if you want to be fault-tolerant this
> state must somehow be shared, or at least it is my assumption that it has to
> be shared. If not I'd be happy to hear about alternative approaches.

A very significant number of reliable* distributed applications do not
need to consistently share state. Only that 1%** does, and that's
difficult, but usually an application is in the 99%-pool. Maybe your
application is there too?

Think about:
* What is your state? Do you really/why do you need it always
available/consistent?
* How do you handle updates to the state? Often it's possible to push
the state consistency problem away from your service -- e.g. the
client (multi-homing or sending full batches) or somewhere downstream.

If you told us a bit more about the application you're building, you
would very likely receive more to-the-point and helpful responses. :-)

Also, a book with The Right Questions (not necessarily for Erlang)
would be interesting. Often it's about making small compromises in the
system you're building (which, turns out, don't matter for the users)
for simplicity of the design and implementation (e.g. making it
non-shared-state).

[*]: that can handle failure of any single server.
[**]: number made up of course, but my feeling is that it's really short.