[erlang-questions] Newbie question, finite state Machine failover

Wed Sep 7 23:21:23 CEST 2011

On Wed, Sep 7, 2011 at 16:42, Ulf Wiger <ulf.wiger@REDACTED> wrote:
>
> I should also say that some protocols allow for a "recovery window", which further simplifies stable-state replication. I know that some people will overlook this and try for hottest possible redundancy, but in several of the products I've been involved in, this property has been crucial.
>

On another plane, having a smaller stable-state often makes it easier
to verify for correctness. If you have some way to make sure that the
state is stable, then by all means check it! You can often export this
state to another process for periodic verification as well. While it
does not _mend_ the error, it _detects_.

Think about a system managing money. Like a certain Friar back in the
day, Luca Pacioli, your system can do double-entry bookkeeping by
having two processes, each playing the role of a ledger account. This
means that you can check the system for a stable state by cross query
on the ledgers. The stable state is small, namely the end balance of
the account and it is enough for verification.

The basic idea is that your process has an internal state s(), and
your system defines several projections of the form -spec projectionX(
s() ) -> t()., where t() is the type of the projection image. You use
these projections for verification, for state marshalling, for system
inspection and so on. It may be you need several different projections
for different purposes, but often they can be coalesced into a few. As
Ulf mentioned, simple t()'s means easier upgrade paths as well.

Notice that state does not come equal. A lot of the internal state of
a process is not valid if something goes wrong anyway. So there is
little reason to keep it around. Sometimes, you keep state which acts
as a scratch pad for your calculations, most often on the stack. This
is not important either when things crash. Most crashes are due to
state inconsistencies anyway, so keeping a leash on the scratchpad
will definitely make your life worse.

Sometimes, you are lucky and data are self-verifying. Sometimes not.
But thinking about what properties your data will have is good for
several reasons - It also makes it easier to write QuickCheck/ProPer
tests. This is also why there is no easy way to do this. If you want
to make a system fault tolerant, you need two machines as a start. But
then you need to make sure that the right kind of information flows
between the two machines. Erlang will make it easy to transfer
information. But no language can verify the properties of the
information that flows. At least not easily.

-- 
J.