[eeps] Multi-Parameter Typechecking BIFs

Wed Feb 25 02:35:11 CET 2009

On 24 Feb 2009, at 10:14 pm, Kenneth Lundin wrote:
> We have the current way of writing:

Proposal 0) -- no change
>
>
> foo(_,{A,B,C},_) when is_integer(A), is_integer(B), is_integer(C) - 
> > ...
>
> Proposal 1)
> Whe have a proposal from James that it should be possible to write  
> like this:
>
> foo(_,{A,B,C},_) when is_integer(A, B, C) -> ...
>
> The reasoning behind this is to make the code shorter and clearer by
> reducing the number of times the type guard
> has to be repeated.
>
> Proposal 2)
>
> We also have the counter-proposal from Mats which introduce the  
> possibility
> to embed the type checking guards within the pattern. Note that it is
> not suggested that
> everything that can be expressed in guards should be moved into the  
> pattern.
> I have myself suggested the same thing with a slightly different
> syntax a couple of years ago.
>
> The reasoning behind this is to make code shorter and clearer by
> reducing the number of times a variable name
> has to be repeated. Encourage the use of type guards by making them
> easy to write.
>
> foo(_,{A::integer, B::integer, C::integer},_) -> ...

We ALSO have Proposal 3, which is to use a macro
with an intention-revealing name, such as

	-define(i3_coords(A, B, C),
		is_integer(A) andalso is_integer(B) and is_integer(C)).
	foo(_, {A,B,C}, _) when i3_coords(A, B, C) -> ...

The reasoning behind this is that intention-revealing names
that express the high level concept we are checking for are
much more readable than low level tests and save every bit
as much repeated typing.  Proposal 1 is dead.  It has no
advantages compared with Proposal 3.

We ALSO have Proposal 4, which is to use abstract patterns,
and after a morning's work there is now an abstract patterns EEP.

By the way, there's an error above:  it *IS* suggested that
everything should be moved into the pattern; what else could
the claim that the invention of 'when' was a mistake mean?

Proposal 4 has all the advantages of proposal 2, without the
readability disaster.  Unlike proposal 2, it really _does_
make the code shorter and clearer.
>

> Now the examples above illustrated the different syntaxed with one
> letter variable names. Mats point which seems to be misunderstood is
> that "real programs" tend to have mnemonic names for the variables
> with a common length of say
> 5 to 10 characters.

It isn't misunderstood at all.
I just ran a program over the R12B-5 sources.
This is an overcount, because it includes ?MACRO
names as variables.

1,905,709 occurrences of
    32,743 distinct variable names
         1 was the length of the shortest variable name
       255 was the length of the longest variable name
         8.26 was the average length.

Yup.  8.26.

> Lets take the examples again with realistic variable names.
>
> Current situation)
Proposal 0)
>
>
> foo(_,{Width,Height,Depth},_) when is_integer(Width),
> is_integer(Height), is_integer(Depth) -> ...
>
> Proposal 1)
>
> foo(_,{Width,Height,Depth},_) when is_integer(Width, Height, Depth) - 
> > ...
>
> Proposal 2)
>
> foo(_,{Width::integer, Height::integer, Depth::integer},_) -> ...

Proposal 3)

foo(_, {Width,Height,Depth}, _)
   when ?i3_coords(Width, Height, Depth) -> ...

Proposal 4)

foo(_, #i3_point(Width,Height,Depth), _) -> ...

If the goal is to reduce repeated typing,
why is ::integer ::integer ::integer thought to be a good thing?

> It may very well be so that there are weak point in proposal 2 but I
> don't buy the arguments that
> Richard have against it. For me it seems that he has looked himself
> into one view and refuses to see
> the problem from another angle.

It's a poor mind that can't see five sides to every question.
In this case, by golly, we have FIVE proposals on the table, not two.

> Maybe there is also a misunderstanding
> in that everything currently expressed in
> the separate guards should be moved into the pattern and that is NOT
> what is suggested.

That is *exactly* what Mats Cronqvist suggested.
In his message of 23-Feb-2009, he wrote:

	Separating the pattern match from the condition is bad.
	The 'when' keyword should have never been introduced.

If you have (... X ...) when X > 0, that is separating the
pattern match from the condition, which is said to be "bad".
If there is no 'when', the condition X > 0 _has_ to go inside
the pattern, thus avoiding separating the pattern match from
the condition.

I can see the advantages of this, yes.

But there is a simple thing that I can see, and apparently
Mats Cronqvist and Kenneth Lundin either don't see it, or
deny its importance.

	Big complicated patterns are hard to read.
	Bigger more complicated patterns are harder to read.
	Anything that makes a pattern bigger makes it
	*harder* for me to read, not easier.

If Mats didn't really mean it, and only wants to move type
tests into patterns, or if someone else doesn't dislike
'when' as Mats does and just wants to move type tests into
patterns, then *that* is a weakness in proposal 2.

The big weakness of proposals 1 and 2 is that they *claim*
to reduce typing, but they don't reduce it very much.
Why?  Because they only reduce typing *within a clause*.
They don't do anything to reduce typing *across* clauses,
as proposals 3 and 4 do.

More fairly, proposal 2 *plus* macros gives you a clunky
special case of proposal 4.  If the proposal were to write

	-define(i3point(X,Y,Z),
		{(X)::integer, (Y)::integer, (Z)::integer}).
	...
	foo(_, ?i3_point(Width,Height,Depth), _) -> ...

then that really would make an interesting and useful reduction
in typing.  (But proposal 4 is still better.)  But oddly enough,
the people supporting proposal 2 don't seem to mind repeating
::integer over and over again.

> On very valid reasoning behind Proposal 2 apart from the short
> notation is that it is more important to avoid repeating
> variable names since they are just "variable" and making them more
> difficult to remember and spell than the well known
> "fixed" names of the type guards which can be seen as part of the  
> language.
> Because of this it is more important to avoid repeating variables than
> is is to avoid repeating type guards.

But the limited number of type tests is precisely part of
the problem.  If you want to communicate with human beings
about your program, you don't want to say "this is a triple
with 3 integers in it" but "this is a date" or "this is a
3d point" or "this is a time of day" or "these are the
lower and upper bounds and median of the length distribution"
or whatever.  This is Software Engineering 101.

It is important to note that since Erlang variable names are
limited in scope to single clauses, information about the
meaning of a variable need not come exclusively from the name
of the variable.  If you have intention-revealing type names,
you don't _need_ long variable names.  (Except in amazingly
long clauses like the 150-liner I just noticed.  Nothing will
save you from long variable names there.)  In most cases,
having some kind of named intention-revealing pattern means
that identifiers can, and for readability should, be shorter.

> Another reasoning with Proposal 2 is that today some  types can easily
> be expressed in the pattern while other can not.
> It is for example easy to match:
> a list  like this [_|_]
> a 2-tuple like this {_,_}
> a 3-element list like this [_,_,_] etc. directly in the pattern
> but not possible to match any integer or any float (only specific  
> ones).

That has always been obvious.  But the version of Proposal 2
that Kenneth Lundin supports (the one where the only thing
that moves into patterns is type tests) STILL has exactly the
same limit.  You'll be able to match "any integer" but not
"any positive integer" or "any odd integer".  By actual
measurement, range tests are much more common than is_integer/1
or integer/1 tests.

Mats Cronqvist can say that his proposal (do not separate pattern
and condition) satisfies this goal, but nobody who wants to stop
half way can say that.

> I think that Proposal 2 is far better than Proposal 1 from James and
> we should either go for Proposal 2 or not
> take any of these suggestions.

Remember, we have five proposals on the table.
If proposal 2 were better than any credible alternative,
we'd have to put up with its ugliness and accept it,
still relying on macros for abstraction.

But abstract patterns accomplish everything that in-line
type tests do and much much more.  (Including, at stage
3, the ability to pass a "pattern" as a parameter.)
>
>
> A third proposal could perhaps be to take advantage of the newly
> introduced -spec syntax and making it possible
> to tell the compiler to generate runtime guards according to the
> -spec, but that is just a wild idea that for sure
> has it's own problems

It is conventional to use type declarations to enable
a compiler to *omit* run-time checks.  (Lisp, ML, CAML, F#,
Haskell, Clean, ...)  Using them to cause the compiler to
*emit* run-time checks is certainly original.  Not the least
of the problems is that code that worked as its author
intended (but was incorrectly annotated) would suddenly break.

> . But I think it is in the right direction since
> we want to avoid writing/expressing the same thing
> many times and in different ways.

Why yes,  that's one of the bad things about Proposal 2!