Erlang/OTP R10B-10 has been released

Wed Mar 22 04:20:19 CET 2006

Matthias Lang <matthias@REDACTED> wrote:
	Another thing. A major use of the preprocessor is conditional
	compilation, e.g. 

	  -ifdef(POLITICALLY_CORRECT)
	  pi() -> 3.0.
	  -else.
	  pi() -> 3.14159.
	  -endif.

This is one of the things that got a lot of discussion when Ada
was being designed, because "No preprocessor!" was one of the key
design requirements for Ada.  Part of the answer had to wait for
Ada 95, namely child packages.  (This is *not* the same thing as
the single-flat-namespaces-but-with-dots-in-the-names-because-Java-
is-so-kewl module names, but something that lets modules have other
modules inside them.)

However, this is one of the easy cases.
The "conditional compilation" thing here has two aspects:

(1) some code is selected and other code is rejected.
    Well, we have 'if' and 'case' for that.

(2) information from the test is supplied from "outside" the
    compilation unit.

These are actually separable concepts.  You could have conditional
compilation where the condition is set inside the module.  And you
could have a literal set outside the module used inside for purposes
other than conditional compilation.

So let's separate.
Let's add a "feature" pseudo-function (strictly speaking it should be
an abstract pattern, but here I am not assuming abstract patterns)
with one argument which must at compile time be an atom; its values
are limited to numbers, atoms, and strings.  Compiled using erlc,
we might have -D<feature>=<value>, ....  Compiled using a function
call, we might pass the features through the options list somehow.

Now,
    The first thing the compiler does is to generate feature/1 (or
    #feature/1) as a real function (or abstract pattern)
    inside the module, so that tools can easily *find out* exactly
    which features were set to what when the module was compiled.

    The second this is that the compiler may allow the pseudo-function
    in patterns and guards (this would be nothing special if it was an
    abstract pattern, but remember, I am not assuming abstract patterns
    here, although I *was* when I wrote about the preprocessor being
    expendable).  And this pseudo-function (...) is inlined without any
    further ado.

So we get

    pi() ->
	case feature(politically_correct)
	  of true  -> 3.0
	   ; false -> 3.14159 /* also wrong */
	 end.

There are a few things we *can't* do.

We cannot make a choice between alternatives some of which our
compiler can read and some of which it can't.  (Lisp systems processing
#+ and #- have to be careful not to try to convert things that look like
numbers into numeric form, because the point of conditionalising might
be to choose between 80-bit numbers and 64-bit numbers, and a 64-bit
system shouldn't do anything with the 80-bit numbers.)

We cannot conditionally export things.  For that we would need new
syntax,

    -export([...]) when <guard>.

We cannot make the *existence* of a function conditional, but that's
as it should be.  If a function is called or exported, it should exist.
If it is not called and not exported, then dead code elimination should
get rid of it anyway.

One nice thing about using *feature/1 is that it should be possible to
compile a module so that the debugger or profiler or test coverage
analyser or whatever can "see" feature tests just exactly like any other
function calls.

	The standard advice is to put the variant code in separate
	modules.  Doing that has quite a disruptive effect on a program's
	organisation---the classic example is that you have a working system
	on hardware X and now you want to extend it to work on hardware
	Y.  Your choices are now

I generally find real examples better than abstract discussions.

Let me offer you a concrete example.

This year my 3rd year software students are supposed to be maintaining
a version of AWK.  I've been doing everything I've told them to do, to
make sure that I'm not asking anything unreasonable.  The source code
is just 12,255 SLOC.  This is not a big program.  But there is a lot
of conditional compilation.  Let's agree that having

	#ifndef FOOBAR_H_
	#define FOOBAR_H_ 19990412
	...
	#endif/*FOOBAR_H_*/

as protection so that foobar.h can safely be included more than once
is entirely benign.  (And a small AWK script checks that these things
occur in and only in headers with matching names.)

But there are still no fewer than EIGHTY-FOUR (84) different
compile-time flags that conditional compilation depends on.

That means that there are 2**84 = 19,342,813,113,834,066,795,298,816 
different versions of this program that need checking.

By deciding that I am no longer the slightest bit interested in having
this program run on anything that doesn't conform to C89, and by
redesign in a couple of cases, I have got this down to TWENTY-SEVEN (27)
different flags.  Of these, I know for sure that I can get rid of one of
them.  Another 3 of them (at least) could be eliminated by using
fdlibm -- they are there to deal with buggy strtod() implementations.
But leave it at the 27 figure.

There are 2**27 = 134,217,728
different versions of this program that still need checking.

When I say "need checking", I don't mean "need run time testing".
What I mean is that without extensive examination, it isn't even
certain that every combination will *compile* cleanly.  (In fact it
is quite certain that many combinations *won't* compile cleanly, I
just don't know ahead of time *which*.)

This is the kind of thing conditional compilation can buy you.
255 tests of 105 flags in just 12,255 SLOC, reduced, after much labour,
to 132 tests of 28 flags.  That's still one test every 93 SLOC or so.

Time for a little honesty here.  I actually *added* two of the remaining
flags, accounting for 7 tests.  That's because I wanted to tell GCC and
the SunPro C compiler that certain functions do not return, so that I
could get better data flow checks.  Fortunately, I have three C compilers,
gcc, cc, and lcc, so I can check all three conditions.

This is *precisely* the kind of patching around differences in the
languages accepted by different compilers that *do* warrant using a
preprocessor.

	  a) Use conditional compilation and sprinkle changes throughout
	     the code. Ugly because it uses the preprocessor and because
	     code gets twice as long in many seperate places. Nice because
	     it's fairly easy to convince yourself that you haven't broken
	     the system for X.

After one change, yes.  By the time you have 28 flags (let alone 105), no.

	The implied question is: is there another way to achieve similar
	effects to conditional compilation?

Let's take a look at some of the remaining flags in the AWK program my
students are working with.  (More precisely, in my copy.  Theirs still
has all 105.  Gosh, I'm cruel.)

Two of them are there to patch around compiler language differences.
I could deal with that by bending the syntax of C to introduce a
'noreturn' keyword for the 'result type' of a function that does not
return.  Then a little "translator" could rewrite this to the dialect
of C accepted by the supported compilers.  (I have in fact done this.
It took 18 SLOC of AWK.  I am reluctant to use it.)

One of them is a library integration issue:  the regular expression
library defines a certain function, the rest of the program defines
another function by the same name, and when this library is used in
this program the library's version loses.  Since the library is not
separately documented and I have no intention of ever using it out-
side this program, I could just rip out the library version. When I
have a better idea *why* the versions are different, I'll do this.

Four of them have to do with integer sizes.  If I were willing to
switch to C99 and use <inttypes.h>, they could disappear.

One of them was added by me and asks whether the program should
support ISO Latin 1 or just ASCII.  This is only checked in case
conversion, and is clearly WRONG:  case conversion should be
sensitive to the current locale.  Wait a minute... ripped that out,
bug fixed.  One more compile-time flag GONE!

Two of them ask whether the environment has real pipes or fake pipes or
neither.  It's not clear whether the "neither" case is supposed to work
or not, although there are hints that it once was.  It's certainly the
case that the code in its current form will not compile cleanly unless
you say there are real pipes.  Fake pipe support was for MSDOS, and my
students have been told to rip out MSDOS support.  Wait a bit ... out
they go!  (The old code DID do the "two versions of a module" thing,
just rather badly.)

One flag concerns executable scripts on OS/2.  It might be nice if this
was still a live issue.  In any case, it could have been done as a
perfectly ordinary conditional, it didn't _have_ to be #ifdef.

One of them is a DEBUG flag.  Quite a few of the tests this controls
are so cheap they should probably always be enabled; some of them could
be asserts.  Some of them control the existence of variables, but if
the code that uses those variables were controlled by ordinary conditionals;
dead variable elimination should get rid of them.  There's nothing here
that couldn't be handled by ordinary conditionals, simple inlining, dead
code and dead variable elimination.

All of the remaining flags have to do with floating point exception
handling (including different variants on matherr) and working around
some strtod() bugs.
Three of them are specific to NetBSD 1.0A.
One of them is specific to 4.3BSD/VAX.
One of them appears to be specific to Solaris.
That's 15 flags relating to floating point exception handling.
The one that is tested most often is said to be "specific to V7 and XNX23A";
UNIX V7 is dead and I've never head of XNX23A, so that could probably go too.
In any case, every single use of this flag (tested 25 times) could be
replaced by ordinary conditionals.

None of these 15 flags (more than half of the total) would have been needed
if C had had a consistent model for floating point exception handling.

I've analysed one particularly nasty C case in some detail.
I think a real Erlang example should be even more instructive.