"defensive programming" (Was: Re: How nice should I be on exit?)

Wed Mar 5 12:25:08 CET 2003

On 5 Mar 2003, Luke Gorrie wrote:

... cut ...

> Hope that clears things up for someone else who learned the other
> definition than the Erlang guys :-)

Yes :-)

I have two "thumb rules"

Check inputs where they are "untrusted" 

	- at a human interface
	- a foreign language program

Or when you want a better error diagnostic that the default one -
in this case just exit with the better diagnostic.

For example if I'm parsing an integer I'd write

	I = list_to_integer(L)

or

	case (catch list_to_integer(L)) of
	    {'EXIT', _} ->
		exit(["Most honored user I regrettably have to inform you
		      that your input on line", Ln, "was not an integer
		      in fact it was ",L, "which IMHO is wrong
		      have a nice day
					Mr. C. Computer"]);
	    I ->
		I
	end.

The latter is "an industrial quality" error message :-)

Note (important) the semantics of both are to raise an exception in the
event of an error. 

Aside: I once saw code like this:

	x(a) -> 1;
	x(b) -> 2;
        x(X) ->	
		%% what do I do now
		io:format("expecting a or b").

The programmer had actually added a comment (What do I do now) -
of course they had done the wrong thing.

The program:

	x(a) -> 1;
	x(b) -> 2.

Is correct.

Evaluating x(c) generates an exception as required.

In their modified program x(c) evaluates to the atom 'ok' (i.e. the return
value of io:format) - which is incorrect.

If they had wanted a better diagnostic they should have written:

	x(a) -> 1;
	x(b) -> 2;
        x(X) ->	exit({x,expects,argument,'a or b'}).

If you do *nothing* to your code you get a good diagnostic anyway:

If x in in the module m and you call this in the shell
you'd get:

(catch m:x(c)).
{'EXIT',{function_clause,[{m,x,[c]},
                          {erl_eval,expr,3},
                          {erl_eval,exprs,4},
                          {shell,eval_loop,2}]}}

function_clause means you couldn't match a function head.

[{m,x,[c]}, ...

means you were calling function x with argument c

So in this case NOT programming the error case  results in

	1) shorted code
	2) clearer code
	3) no chance of accidentally violating the spec
	   by introducing ad hock "out of spec" code to correct the
	   error
        4) perfectly acceptable error diagnostic

  IMHO 3)  is a  big gain -  specifications always  say what to  do if
everything works  - but never what  to do if the  input conditions are
not met - the usual answer is something sensible - but what you're the
programmer - In C etc. you  have to write *something* if you detect an
error -  in Erlang it's  easy - don't  even bother to write  code that
checks for errors - "just let it crash".

  Then  write a  *independent* process  that observes  the  crashes (a
linked process) -  the independent process should try  to correct the
error, if it can't correct  the error it should crash (same principle)
- each monitor  should try a  simpler error recovery strategy  - until
finally the  error is  fixed (this is  the principle behind  the error
recovery tree behaviour).

  Why was error handling designed like this?

  Easy - to make fault-tolerant  systems you need TWO processors.  You
can never ever make a fault tolerant system using just one processor -
because if that processor crashes you are scomblonked.

  One  physical processor does  the job  - another  separated physical
processor  watches  the first  processor  fixes  errors  if the  first
processor  crashes - this  is the  simplest possible  was of  making a
fault-tolerant system.

  This principle is mirrored exactly in the Erlang process structure -
this is because we want to have "location transparency" of processes -
in other  words at a  certain level of  abstraction we do not  wish to
know which physical processor an individual Erlang process runs on.

  This is  the fundamental reason  why we use "remote  error recovery"
(i.e. handling  the error  in a different  process, to the  process in
which  the error occurred)  - it  turns out  that this  has beneficial
implications for  the design  of a system;  mainly because there  is a
clean separation  between doing a job,  observing if the  job was done
and fixing an error if an error has occurred.

  This   organization  corresponds   nicely  to   a   idealized  human
organization of  bosses and workers -  bosses say what is  to be done,
workers do stuff.  Bosses do quality control and check that things get
done, if not they fire people  re-organize and tell other people to do
the stuff.  If  they fail (the bosses) they get  sacked etc.  <<note I
said, idealized organization, usually  if projects fail the bosses get
promoted and given more workers for their next project>>

  /Joe