[erlang-questions] let it crash erlang/ada [[was: Time for OTP to be Renamed?]

João Neves <>
Mon Feb 17 14:01:24 CET 2014


Of course.

Sadly the article is very light in technical terms (I suppose the main
audience is non-technical people), so I can't really tell if they're doing
the whole shebang with separate hardware and software stacks. It incurs in
extra cost and complexity per launch (you may not get the same bug on both
systems but you're probably sure as hell to have different bugs), and maybe
the benefits aren't that great compared to having one exhaustively tested,
understood, software and hardware stack so the cost-benefit relation is
just not worth it - especially since one of SpaceX's stated goals is to
provide "cheap" access to space.


--
João Neves


2014-02-17 13:44 GMT+01:00 Miles Fidelman <>:

> Interesting.
>
> What I find particularly notable about the Space Shuttle design is the
> notion of separate development of hardware and software - by different
> teams at different vendors - as a fault checking mechanism.
>
> I guess the core mathematics has to be the same (e.g., ballistic
> calculations), but beyond that, different code, running on different
> hardware, but they have to get to the same results, or something is wrong.
>
> More copies of the same hardware/software just means more copies of any
> bugs!
>
> Cheers,
>
> Miles
>
>
> João Neves wrote:
>
>> SpaceX also does it and is a central part of their design:
>>
>> "Q: So, these flight computers on Dragon – there are three on board, and
>> that's for redundancy?
>>
>> A: There are actually six computers. They operate in pairs, so there are
>> three computer units, each of which have two computers checking on each
>> other. The reason we have three is when operating in proximity of ISS, we
>> have to always have two computer strings voting on something on critical
>> actions. We have three so we can tolerate a failure and still have two
>> voting on each other. And that has nothing to do with radiation, that has
>> to do with ensuring that we're safe when we're flying our vehicle in the
>> proximity of the space station.
>>
>> I went into the lab earlier today, and we have 18 different processing
>> units with computers in them. We have three main computers, but 18 units
>> that have a computer of some kind, and all of them are triple computers –
>> everything is three processors. So we have like 54 processors on the
>> spacecraft. It's a highly distributed design and very fault-tolerant and
>> very robust."
>>
>> (http://www.aviationweek.com/Blogs.aspx?plckBlogId=Blog:
>> 04ce340e-4b63-4d23-9695-d49ab661f385&plckPostId=Blog:
>> 04ce340e-4b63-4d23-9695-d49ab661f385Post:a8b87703-
>> 93f9-4cdf-885f-9429605e14df)
>>
>>
>> --
>> João Neves
>>
>>
>> 2014-02-17 13:29 GMT+01:00 Miles Fidelman <<mailto:
>> >>:
>>
>>
>>     Jesper Louis Andersen wrote:
>>
>>
>>         On Sun, Feb 16, 2014 at 10:11 PM, Miles Fidelman
>>         <
>>         <mailto:>
>>         <mailto:
>>
>>         <mailto:>>> wrote:
>>
>>             Good point.  "Let it crash" does take on a whole different
>>         meaning
>>             when dealing with aircraft and such.
>>
>>
>>         This is a different point as well! You have two axis:
>>
>>         * soft vs hard realtime. Some systems require hard realtime
>>         and then your tools are limited to languages where you have
>>         explicit memory control, enabling you to avoid allocating
>>         memory and triggering garbage collection. In soft realtime
>>         systems, you have more leeway, and if built the way of the
>>         Erlang runtime system, you get really good soft realtime
>>         capability.
>>
>>         * Proactive vs Reactive error handling. The idea of "let it
>>         crash" is definitively reactive, whereas static type systems,
>>         proofs, model checking, etc are means of proactive error handling.
>>
>>         My claim however, is that you need "Let it crash" in Aircrafts
>>         as well if you want to have a stable aircraft. The model where
>>         you blindly attempt to eradicate every error from a program is
>>         bound to fail sooner or later. Usually "let it crash" in those
>>         situations is implemented in hardware by having multiple
>>         redundant systems. But rarely are systems exempt of failure.
>>         Even in a highly controlled environment.
>>
>>
>>     We've really strayed off-topic here, but....
>>
>>     My all-time favorite design for seriously mission-critical systems
>>     was the flight control system for the Space Shuttle.  I'm not sure
>>     this is true of the later versions, but originally:
>>     - the flight control software ran on 5 parallel computers, that
>>     voted on results
>>     - 4 of the computers came from one contractor (hardware and software)
>>     - the 5th machine, just ran mission-critical code, with a
>>     completely separate design (both hardware and software)
>>     - I don't remember how the tie-breaking algorithm worked
>>
>>     Cheers,
>>
>>     Miles
>>
>>
>>
>>
>>     --     In theory, there is no difference between theory and practice.
>>     In practice, there is.   .... Yogi Berra
>>
>>     _______________________________________________
>>     erlang-questions mailing list
>>      <mailto:>
>>     http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.   .... Yogi Berra
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140217/d9f3d441/attachment.html>


More information about the erlang-questions mailing list