[erlang-questions] clueless performance question

Thu Jun 12 13:18:47 CEST 2008

On Thu, Jun 12, 2008 at 10:56 AM, Ulf Wiger (TN/EAB)
<ulf.wiger@REDACTED> wrote:
> recover quickly enough. Obviously, with a "9 nines" target
> (3.1 ms/year), one 30 second outage uses up the entire
> budget for the next ten thousand years, so in this case, it's

When I first got interested in Erlang, it was from reading texts that
focused more on the mean-time-to-recover-from-failure. It is an
interesting number that has more impact on the system design. I'm not
sure if this is Joe Armstrong's contribution, or if he is channeling
what the telecom industry as a whole already knew. Erlang/OTP
certainly focuses on recovering from errors, looking at them as
inevitable.

If you want a very low mean-time-to-recover, then humans cant be
involved. The system needs to fail over automatically, it needs to
monitor itself, it needs to restart itself.   If you need a human to
recover from an error, your system is not likely a system that makes
the system owner happy.

I can see many applications where it is MUCH better if the
mean-time-between failure is 24hs with a mean-time-to-recover around a
minute, in contrast to a system where the mean-time-between failure is
1 year and the mean-time-to-recover is around 365 minutes.