[erlang-questions] Help with design of distributed fault-tolerant systems

Thu Oct 8 06:18:27 CEST 2015

On Thu, Oct 8, 2015 at 7:23 AM, Martin Karlsson
<martin@REDACTED> wrote:
> Perhaps I am being to strict in my requirements and that the system doesn't actually have to be always consistent and always running etc

One thing I see too often in my industry is great lengths being taken
to make a single service interface instance highly available when the
clients are perfectly prepared to handle failures with retries and
failover to other service instances.  A client which is multi-homed to
two or more servers may fail over to another server if informed that a
problem happened at it's first choice server (e.g. resource
unavailable or process crash).  That ends up being a more robust and
cheaper end-to-end solution that having a single IP address for the
service and moving it between active and standby servers while sharing
all state.  I see solutions built using load balancers for services
using SCTP and have to ask the question, did anyone actually analyze
the requirements?

For example if your gen_fsm handles connection state for long lived
sessions do you need to share all the states involved in setup and
teardown or just the connected state?  The latter is cheaper and
easier and the client may well be robust enough so that everyone
remains happy.

-- 
     -Vance