Making reliable distributed systems in the presence of software errors
Joe Armstrong
joe@REDACTED
Thu Nov 13 11:01:58 CET 2003
On 12 Nov 2003, Luke Gorrie wrote:
> =?Windows-1252?Q?Bjarne_D=E4cker?= <bjarne@REDACTED> writes:
>
> > http://www.sics.se/~joe/thesis/spikblad.html
>
> Nice typography! :-)
>
Thanks -
Shameless plug follows.
Hello all Erlangers ...
You might like to *read*
http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf
The central problem this thesis is "How to make reliable systems in
the presence of software errors".
We know how to make reliable systems in the presence of *hardware*
errors (answer replicate) - but what about *software* errors - here
replication does not help - replicating faulty software doesn't help
at all - it just makes matters worse - instead of one failing program
we have two failing programs, both of which fail for exactly the same
reason.
Since most things fail because of software errors this problem
seems much more interesting than the "hardware" fault-tolerance
problem.
Erlang is part of the story - the thesis contains (among other
things)
- A philosophy of programming (Called Concurrency Oriented Programming)
- A description of Erlang
- Examples of how to program in Erlang
- A method for programming fault tolerant systems
- A description of an implementation of this method
(ie a description of the major OTP behaviours)
- Examples of how to program with the OTP behaviours
- Case studies to see if the method works (I claim it does)
- A method for specifying the interaction between components (UBF)
Much of the material in the thesis can be viewed as "the missing
Erlang documentation" since it records not "how things are done" but,
more importantly "why things were done"
Have a good read
/Joe
More information about the erlang-questions
mailing list