[erlang-questions] let it crash erlang/ada [[was: Time for OTP to be Renamed?]

Mon Feb 17 13:44:16 CET 2014

Interesting.

What I find particularly notable about the Space Shuttle design is the 
notion of separate development of hardware and software - by different 
teams at different vendors - as a fault checking mechanism.

I guess the core mathematics has to be the same (e.g., ballistic 
calculations), but beyond that, different code, running on different 
hardware, but they have to get to the same results, or something is wrong.

More copies of the same hardware/software just means more copies of any 
bugs!

Cheers,

Miles

João Neves wrote:
> SpaceX also does it and is a central part of their design:
>
> "Q: So, these flight computers on Dragon – there are three on board, 
> and that's for redundancy?
>
> A: There are actually six computers. They operate in pairs, so there 
> are three computer units, each of which have two computers checking on 
> each other. The reason we have three is when operating in proximity of 
> ISS, we have to always have two computer strings voting on something 
> on critical actions. We have three so we can tolerate a failure and 
> still have two voting on each other. And that has nothing to do with 
> radiation, that has to do with ensuring that we're safe when we're 
> flying our vehicle in the proximity of the space station.
>
> I went into the lab earlier today, and we have 18 different processing 
> units with computers in them. We have three main computers, but 18 
> units that have a computer of some kind, and all of them are triple 
> computers – everything is three processors. So we have like 54 
> processors on the spacecraft. It's a highly distributed design and 
> very fault-tolerant and very robust."
>
> (http://www.aviationweek.com/Blogs.aspx?plckBlogId=Blog:04ce340e-4b63-4d23-9695-d49ab661f385&plckPostId=Blog:04ce340e-4b63-4d23-9695-d49ab661f385Post:a8b87703-93f9-4cdf-885f-9429605e14df)
>
>
> --
> João Neves
>
>
> 2014-02-17 13:29 GMT+01:00 Miles Fidelman <mfidelman@REDACTED 
> <mailto:mfidelman@REDACTED>>:
>
>     Jesper Louis Andersen wrote:
>
>
>         On Sun, Feb 16, 2014 at 10:11 PM, Miles Fidelman
>         <mfidelman@REDACTED
>         <mailto:mfidelman@REDACTED>
>         <mailto:mfidelman@REDACTED
>         <mailto:mfidelman@REDACTED>>> wrote:
>
>             Good point.  "Let it crash" does take on a whole different
>         meaning
>             when dealing with aircraft and such.
>
>
>         This is a different point as well! You have two axis:
>
>         * soft vs hard realtime. Some systems require hard realtime
>         and then your tools are limited to languages where you have
>         explicit memory control, enabling you to avoid allocating
>         memory and triggering garbage collection. In soft realtime
>         systems, you have more leeway, and if built the way of the
>         Erlang runtime system, you get really good soft realtime
>         capability.
>
>         * Proactive vs Reactive error handling. The idea of "let it
>         crash" is definitively reactive, whereas static type systems,
>         proofs, model checking, etc are means of proactive error handling.
>
>         My claim however, is that you need "Let it crash" in Aircrafts
>         as well if you want to have a stable aircraft. The model where
>         you blindly attempt to eradicate every error from a program is
>         bound to fail sooner or later. Usually "let it crash" in those
>         situations is implemented in hardware by having multiple
>         redundant systems. But rarely are systems exempt of failure.
>         Even in a highly controlled environment.
>
>
>     We've really strayed off-topic here, but....
>
>     My all-time favorite design for seriously mission-critical systems
>     was the flight control system for the Space Shuttle.  I'm not sure
>     this is true of the later versions, but originally:
>     - the flight control software ran on 5 parallel computers, that
>     voted on results
>     - 4 of the computers came from one contractor (hardware and software)
>     - the 5th machine, just ran mission-critical code, with a
>     completely separate design (both hardware and software)
>     - I don't remember how the tie-breaking algorithm worked
>
>     Cheers,
>
>     Miles
>
>
>
>
>     -- 
>     In theory, there is no difference between theory and practice.
>     In practice, there is.   .... Yogi Berra
>
>     _______________________________________________
>     erlang-questions mailing list
>     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>     http://erlang.org/mailman/listinfo/erlang-questions
>
>

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra