[erlang-questions] let it crash erlang/ada [[was: Time for OTP to be Renamed?]

João Neves sevenjp@REDACTED
Mon Feb 17 13:39:07 CET 2014

SpaceX also does it and is a central part of their design:

"Q: So, these flight computers on Dragon – there are three on board, and
that's for redundancy?

A: There are actually six computers. They operate in pairs, so there are
three computer units, each of which have two computers checking on each
other. The reason we have three is when operating in proximity of ISS, we
have to always have two computer strings voting on something on critical
actions. We have three so we can tolerate a failure and still have two
voting on each other. And that has nothing to do with radiation, that has
to do with ensuring that we're safe when we're flying our vehicle in the
proximity of the space station.

I went into the lab earlier today, and we have 18 different processing
units with computers in them. We have three main computers, but 18 units
that have a computer of some kind, and all of them are triple computers –
everything is three processors. So we have like 54 processors on the
spacecraft. It's a highly distributed design and very fault-tolerant and
very robust."


João Neves

2014-02-17 13:29 GMT+01:00 Miles Fidelman <mfidelman@REDACTED>:

> Jesper Louis Andersen wrote:
>> On Sun, Feb 16, 2014 at 10:11 PM, Miles Fidelman <
>> mfidelman@REDACTED <mailto:mfidelman@REDACTED>> wrote:
>>     Good point.  "Let it crash" does take on a whole different meaning
>>     when dealing with aircraft and such.
>> This is a different point as well! You have two axis:
>> * soft vs hard realtime. Some systems require hard realtime and then your
>> tools are limited to languages where you have explicit memory control,
>> enabling you to avoid allocating memory and triggering garbage collection.
>> In soft realtime systems, you have more leeway, and if built the way of the
>> Erlang runtime system, you get really good soft realtime capability.
>> * Proactive vs Reactive error handling. The idea of "let it crash" is
>> definitively reactive, whereas static type systems, proofs, model checking,
>> etc are means of proactive error handling.
>> My claim however, is that you need "Let it crash" in Aircrafts as well if
>> you want to have a stable aircraft. The model where you blindly attempt to
>> eradicate every error from a program is bound to fail sooner or later.
>> Usually "let it crash" in those situations is implemented in hardware by
>> having multiple redundant systems. But rarely are systems exempt of
>> failure. Even in a highly controlled environment.
> We've really strayed off-topic here, but....
> My all-time favorite design for seriously mission-critical systems was the
> flight control system for the Space Shuttle.  I'm not sure this is true of
> the later versions, but originally:
> - the flight control software ran on 5 parallel computers, that voted on
> results
> - 4 of the computers came from one contractor (hardware and software)
> - the 5th machine, just ran mission-critical code, with a completely
> separate design (both hardware and software)
> - I don't remember how the tie-breaking algorithm worked
> Cheers,
> Miles
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.   .... Yogi Berra
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140217/5f109176/attachment.htm>

More information about the erlang-questions mailing list