[erlang-questions] Newbie question, finite state Machine failover

Ulf Wiger ulf.wiger@REDACTED
Wed Sep 7 09:51:00 CEST 2011

On 6 Sep 2011, at 23:09, Jon Watte wrote:

> Stateful, as in the fail-over needs to be "hot" and "online" and replicating the state of the first application faithfully?
> The danger with such approaches is that, if the state becomes corrupt through some chain of events, then the replicated copy may also be corrupt, and the "slave" crashes when the "master" crashes. It still works great in case of hardware failure on the master instance, of course.

You are right. One way to mitigate this is to put some effort into designing a replication format, which is not just mirroring the internal state. Not only will this reduce the likelihood of propagating corrupted state; it will also simplify potential future upgrades and extensions, and make it easier to analyse the traffic flowing between nodes.

One should also think through at which points it is at all meaningful to replicate. I like to refer to "stable-state replication", which doesn't really say anything about the frequency of updates, but rather highlights that there are usually discrete points where recovery from error is meaningful. The transition states between these points tend to be volatile, and replicating them may serve little purpose.

Ulf W

Ulf Wiger, CTO, Erlang Solutions, Ltd.

More information about the erlang-questions mailing list