[erlang-questions] comma vs andalso

Mon Jul 20 17:42:29 CEST 2009

Sorry for dropping my end of this; here we go.

----- Original Message ----
> From: Richard Carlsson <richardc@REDACTED>
> 
> Thomas Lindgren wrote:
> > Guards are an awful mess in Erlang. I'd be quite surprised if the
> > above difference in behaviour was intentional. If so, what possible
> > purpose does it serve?
> 
> It is intentional in the respect that 'andalso'/'orelse' behave just
> like any other operator in a guard test. As for purpose, all Erlang's
> operators are allowed in guard expressions, so why make an exception?

First, note that guards as such already are a limited form of boolean formulae, where comma is interpreted as conjunction/and, and semicolon is interpreted as disjunction/"or". Evaluation of guards was, I would suppose, inspired by logic programming: if the formula could be satisfied, the guard succeeded; if not, it simply failed. Thus, a guard like 1/X > Z/Y would not throw an exception when X or Y was zero, but simply fail (because the test is not true).

For reasons unknown (except to some of the very senior members of this mailing list), the initial form of guards only permitted conjunctions of expressions G = E1,E2,E3 ... Some time later, a limited form of disjunction was added, permitting us to write (G1 ; G2 ; ... ; Gn) where the Gi were conjunctions, but still not permitting arbitrary boolean formulae as guards: no disjunctions inside conjunctions, in particular. As I recall, this was because of parser problems. When and/or was being added to guards, I do recall suggesting that general boolean guards should be permitted, to be written using nested and/or. (This was discussed sometime in the 1997-1999 timeframe, right?) But they weren't.

> > At one point, you could also use "and"/"or", the boolean operators
> > manque, in guards. (I haven't checked recently whether this is still
> > the case.) So we then have three very similar sets of guard
> > operators.
> 
> 'and'/'or' have always (as far back as I can remember, anyway) been
> allowed in guards, again, probably simply by virtue of being general
> operators in the language. And they don't behave like ','/';' either.

Well, to be precise they have only been allowed since some time after and/or were introduced into the language. (At the time when lots of stuff was added, such as funs, records and so on.) I mention this because I think the mess arises from stuff having been added incrementally with different intents by various people over the years.

Moving on, no, indeed they don't behave the same, and I do consider that a problem. As a consequence, we have three subtly different ways to write boolean formulae in guards. (Sorry about the jargon, dear readers, I'm trying to stay away from "boolean expression" here since we already are using "expression" in another sense) And unfortunately, at this point, _none_ of the three ways actually permit writing full boolean formulae. I consider this a failure in how guards are designed, which is why I'm always complaining about it.

> > Not to mention the twin set of type tests, introduced, as far as I
> > know, to get rid of the lone double entendre of float/1 (as a test or
> > conversion not the most frequent of operations).
> 
> That was one details, yes, but the main reason was to make it possible
> to refactor a piece of code without being forced to change the names
> of the function tests if you moved an expression from within a guard
> to an ordinary expression and vice versa. Recall that before the is_...
> versions, there was no way of saying e.g., "Bool = (is_integer(X) and
> is_atom(Y))" without writing a case/if or a separate predicate function.

I seem to recall boolean BIFs for old-style type tests (atom/1, integer/1, ...) in older Erlangs, which would permit one to write (atom(X) and atom(Y)) in clause bodies. Perhaps I misremember; still, there is no technical reason to avoid them in favour of the longer new names, _except_ possibly the clash with float/1. In that regard, I would argue that it would have been far easier to rename the lone float conversion operation rather than all type tests.

> The old type tests didn't correspond to any built-in functions, so you
> had to treat them as "primops" inside the compiler, you couldn't refer
> to them and pass them around, etc. But the old type test names "atom(X)"
> and so forth could not simply be made to work outside guards because
> there would be name clashes (with the float(X) bif and with any existing
> code that defined a function such as list(X) or integer(X)), hence the
> is_... prefix for the generally usable versions that are proper built-in
> functions (defined in the 'erlang' module along with all the others).

But this doesn't solve the problem -- it merely shifts name clashes to another part of the name space. Nor is there anything inherently impossible about defining and providing the BIF erlang:atom/1 instead of erlang:is_atom/1.

So, to conclude: it seems to me as if keeping the short names would have been just about the same as the current approach, except saving three characters per type test.

> > And now, for our convenience, the shorter form of these tests is being
> > deprecated.
> 
> Hold your horses - nobody is deprecating the use of ',' and ';'. 

(I was talking about the short type tests here.)

> This fail-to-false
> behaviour was in my opinion a mistake, because it occasionally hides
> a bug in the guard and turns what should have been an observable runtime
> crash into a silent "well, we take the next clause then". Some people
> like to use this as a trick to write more compact guards, but that
> makes it hard for someone reading the code to tell whether the
> crash-jumps-to-the-next-case is intentional or not.

See above for what I would think is the reasoning behind the classic semantics. The main drawback of explicit crashes in guards is that as a guard-writer you don't have a lot of opportunities to catch them. To catch and hide explicit crashes, you may then have to turn clause guards into case-expressions. (In this context, I'm tempted to instead make an argument for fullblown expressions in guards, but let's leave that little hairball for another day.)

> ... I'm not sure I have any real arguments against nesting of ','/';',
> but I fear the grammar could get rather messy, and that such nested
> guards could be quite difficult to read in practice.

Well, let me then register my strong vote for actually, finally implementing full boolean formulae in guards. Rather than making the code harder to read, it will become easier: there is no need to code around the issue when you actually need to compose disjunctions, and the tests themselves will be hidden in well-formed macros. Macros which as a bonus can be composed fairly nicely without obscure parsing errors. And, in contrast with using and/or/andalso/orelse, the composed guards will still behave like classic guards.

Finally, a question regarding the grammar issue: this seems superficially like adding two more operators to the expression operator precedence grammar. Is there more and worse than that?

Best,
Thomas