[erlang-questions] Never let it fail!

Fri Nov 3 04:57:46 CET 2006

Jay Nelson <jay@REDACTED> drew our attention to

	http://www.cs.utexas.edu/users/wcook/Drafts/2006/RinardOOPSLA06.pdf

http://www.cs.utexas.edu/users/wcook/Drafts/2006/RinardOOPSLA06.pdf

It takes guts to stand up in public and say
    "Can achieve previously inconceivable levels of cluelessness (and
     therefore functionality) in successful deployed systems"
and
    "Software should be [...] not correct"
and
    "If you want to reduce cost and difficulty of producing
     acceptable software
     * Make more errors acceptable
     * Leave more errors in system"

(Or, "Everything would be fine if users would learn to love the lash.")

Hey, I'm using some software like that right now.  I was trying to
copy and paste that text from the PDF file, viewing it in Acrobat
Reader, to this plain old terminal.  My keyboard has a "Copy" key;
this kind of computer has had such a key for, what, 20 years?  And
Acrobat DOES NOT UNDERSTAND IT and doesn't understand that it
doesn't understand it.  Unrecognised keypresses are quietly ignored.
It's only when you paste and get something from half an hour ago that
you discover the cluelessness of Acrobat Reader.  10 minutes of irritation
on my part, several times a week, added up over however many users.
No, we are *NOT* "doing ... Great".  We are creating oceans of low-level
misery that we don't need to, and it's being saying "never mind the
quality, read the feature list" who deserve the blame.

Or take this advice:
[Drat.  That's the FOURTH time I've been caught by Acrobat.  Why is that
the ONLY program I use that doesn't understand the Copy key?]

    "Perform dynamic bounds checks
     Discard out of bounds writes
     Manufacture values for out of bounds reads"

Telling people that their programs should be allowed to go around
hallucinating certainly takes guts.  I'm not sure it takes a brain.

Their advice further on about hiding memory leak problems reminds me
of a science fiction story I read in Analog several years ago.  In outline,
someone had developed a general medical treatment box -- didn't do surgery
or handle broken bones, but did handle infectious diseases, cancer, and
auto-immune type stuff.  (Basically by using gobble-de-good to stimulate
the body's own defences.)  Wonderful, everyone is fine, but now there is
this new social problem:  Box addicts, who find they have to take a
treatment every day or they don't feel well any more.  Punchline:  Smallpox
has made a comeback, the Box addicts weren't addicts at all, they were
genuinely sick, and the Box was masking their symptoms and thereby helping
Smallpox to spread.  I think you can see the relevance of this to the idea
of *hiding* memory leaks (or any other kind of problem) instead of logging
and restarting so that someone KNOWS about the problem and can fix it.

These slides are starting to scare me, people.

What kinds of errors do they talk about?

    - reading outside the bounds of an array.
      Indexing outside a tuple is caught in Erlang.

    - writing outside the bounds of an array.
      There is no such thing as writing into an array in Erlang;
      this kind of error cannot happen.

    - memory leaks due to failing to call free.
      There is no free() in Erlang; this can't happen quite as simply.
      (You _can_ get stuff retained longer than you expected, though.)
      If one process starts stealing all of memory, you can just kill it.

Their laudable aim is SURVIVABLE software; software where the presence of
a fault doesn't mean the loss of a service.  But Erlang gives you that in
a much simpler, much safer, and agreeably biological way:  apoptosis plus
the OTP behaviours.  Apoptosis is when a cell that gets into trouble
kills itself.  The organism can grow another one.  In the same way, an
Erlang process that starts going insane ISN'T told "that's wonderful,
hallucinate and survive", it is killed, and the supervisor or whatever
can grow another one.

To twist a famous quotation, on reading this idea about trying to build
robust large scale systems in an unsafe language originally designed for
systems about as powerful as a modern wristwatch, by means of catching
what few errors the language _can_ catch and pretending they never happened,
my reaction is

    It's magnificent, but it isn't sane.

Oh, oh!  The woodpeckers are coming!