[eeps] Multi-Parameter Typechecking BIFs

Thu Feb 19 23:40:29 CET 2009

On 19 Feb 2009, at 11:32 pm, Vlad Dumitrescu wrote:
>
> Yes, this works in this case, but one may have something like
>    func({_,X,_,Z}, [{_,Y,A} | _]) when is_float(X,Y,Z,A) ->...
> so IMHO this shortcut is still useful.

One *MAY* have anything one pleases.
The question is, HOW OFTEN will one have things like this
where the configuration of type tests is UNIQUE?
Frankly, I just don't believe that this is common enough
to worry about.

Cases where the configuration of type tests is often
repeated obviously DO call for help, but those are
precisely the cases that abstract patterns help with.

The original proposer provided an example of real code,
and _that_ provided the key insight:  if you are dealing
with 3d vectors represented as {X,Y,Z} triples of floating
point numbers, then you are going to need three type tests
*often* (indeed, practically every time you look at a
vector), precisely because Erlang doesn't currently let
you express "3-element floating-point vector" as a named thing.

Let's take an example from the OTP sources.

coords({X,Y}) when is_number(X),is_number(Y) ->
     [gstk:to_ascii(X), " ", gstk:to_ascii(Y), " "];
coords([{X,Y} | R]) when is_number(X),is_number(Y) ->
     [gstk:to_ascii(X), " ", gstk:to_ascii(Y), " ", coords(R)];
coords({{X1,Y1},{X2,Y2}}) when  
is_number(X1),is_number(Y1),is_number(X2),is_number(Y2) ->
     [gstk:to_ascii(X1), " ", gstk:to_ascii(Y1)," ",
      gstk:to_ascii(X2), " ", gstk:to_ascii(Y2)];
coords([_]) ->
     invalid;
coords([]) ->
     [].

With abstract patterns,

#point2(X, Y) when is_number(X), is_number(Y) -> {X, Y}.

coords(#point2(X,Y)) ->
     [gstk:to_ascii(X), " ", gstk:to_ascii(Y), " "];
coords([#point2(X,Y)|R]) ->
     [gstk:to_ascii(X), " ", gstk:to_ascii(Y), " ", coords(R)];
coords({#point2(X1,Y1),#point2(X2,Y2)}) ->
     [gstk:to_ascii(X1), " ", gstk:to_ascii(Y1), " ",
      gstk:to_ascii(X2), " ", gstk:to_ascii(Y2)];
coords([_]) ->
     invalid;
coords([]) ->
     [].

In a quick trawl through the R12B-5 sources,
230 cases of ... integer( ... integer(
   2 cases of ... float(   ... float(
   6 cases of ... number(  ... number(
350 cases of ... list(    ... list(
175 cases of ... atom(    ... atom(
--- ----- --     --------     ---------
763 cases of ... Test(    ... Test(
turned up, or roughly one per 1200 SLOC, where you
should imagine an optional "is_" in front of each test name.

It would be rather more effort to analyse all of these than I
have time for right now.  Many of them are certainly bogus.
Some of those that are not bogus would be made harder to
grasp using multi-argument type tests:

	f(X, Y, Z) when is_integer(X), is_integer(Y) -> ...;
	f(X, Y, Z) when is_integer(X), is_atom(Y)    -> ...;
	f(X, Y, Z) when is_atom(X),    is_integer(Y) -> ...;
	f(X, Y, Z) when is_atom(X),    is_atom(Y)    -> ...;
	...

is an outline of one pattern I noticed.

There are three main limitations of the proposal before us.
(1) It only addresses *type tests*.
(2) It can only repeat the *same* type test; it cannot
     group in one form a common configuration of mixed type tests.
(3) Common test configurations *still* have to be repeated
     every time they are needed, they cannot be named.

Heck, even macros are better than that.
Seeing that atom...atom repeats were common, I went looking.
One of the first things I found was

conv_mfa({M,F,A}) when is_atom(M), is_atom(F), is_integer(A) ->
   hipe_arm:mk_mfa(M, F, A).

This is tailor-made for an abstract pattern.

     #mfa(M, F, A)
     when is_atom(M), is_atom(F), is_integer(A), A >= 0
     -> {M, F, A}.

Note that the type tests were only an approximation.
{foo,bar,-27} is _not_ a plausible function name.

conf_mfa(#mfa(M, F, A)) -> hipe_arm:mk_mfa(M, F, A).

We also turn out to need

     #mfl(M, F, L)
     when is_atom(M), is_atom(F), is_list(L)
     -> {M, F, L}.

It turns out that the vast majority of atom...atom type tests
are checking for a module name and a function symbol, where
the important thing is NOT "these two things are, separately,
atoms", but "these two things are all-but-the-arity of a
function name".  The ones that cannot be handled by the two
abstract patterns above are ones where the elements have
already been unpacked as separate arguments.

Now we *already* have a good way to deal with this that is
free of the limitations listed above.  I never expected to
have a good word to say for macros, but here it is.

     -define(mod_func(M, F),
	is_atom(M), is_atom(F)).
     -define(mod_func_arity(M, F, A),
	?mod_func(M, F), is_integer(A), A >= 0).
     -define(mod_func_args(M, F, L),
	?mod_func(M, F), is_list(L)).
     #mf(M, F) when ?mod_func(M, F) -> {M,F}.
     #mfa(M, F, A) when ?mod_func_arity(M, F, A) -> {M, F, A}.
     #mfl(M, F, L) when ?mod_func_list(M, F, L) -> {M, F, L}.

Now we can rewrite

apply_after(Method, M, F, A, Time) when atom(M), atom(F), list(A) ->
     if
         Time == infinity ->
             apply_after_infinity;
         integer(Time) ->
             Msg = {apply_after, Method, M, F, A},
             Ref = erlang:send_after(Time, whereis(?SERVER), Msg),
             {apply_after, Ref}
     end.

as

apply_after(Method, M, F, A, Time) when ?mod_func_arity(M, F, A) ->
     if Time == infinity ->
         apply_after_infinity
      ; is_integer(Time) ->
	Msg = {apply_after, Method, M, F, A},
	Ref = erlang:send_after(Time, whereis(?SERVER), Msg),
	{apply_after, Ref}
     end.

The more I look into this, the more I begin to wonder whether
we ought to recommend as part of good Erlang style that "bare"
type tests should be avoided.

For example, there is a common configuration

	when atom(Module), atom(Pre), atom(Post)

But these aren't just any old atoms, they are atoms to be
interpreted a particular way.  One of them is a module name
and the other two are not.  I would prefer to see

	-define(is_pre(Pre), is_atom(Pre)).
	-define(is_post(Post), is_atom(Post)).
	-define(is_mod(Module), is_atom(Module)).
	-define(mod_pre_post(Module, Pre, Post),
	    ?is_mod(Module), is_pre(Pre), is_post(Post))
	...
	when ?mod_pre_post(Module, Pre, Post)

Amongst other things, if "pre" and "post", whatever they
are, should eventually be changed to something other than
atoms, this would be easier to maintain.

I remain no admirer of macros, but the advantage of giving
common configurations of type tests intention-revealing names
that I think I *have* to regard anything that would reduce
the pressure on programmers to use intention-revealing names
as a step in the wrong direction.

It so happens that you *can* use abstract patterns in a slightly
perverse way to express tests on several variables; you just
can't use such "patterns" as patterns.

	#pre(B)  when is_atom(B) -> B.
	#post(A) when is_atom(A) -> A.
	#mod(M)  when is_atom(M) -> M.
	#mod_pre_post(#mod(M), #pre(B), #post(A)) -> true.
	...
	when #mod_pre_post(Module, Pre, Post)

#mod_pre_post/3 either rewrites to true (which succeeds) or
it fails.  It can't be used as a pattern because there are
variables in the head that aren't in the body.  This is not
the original intended use of abstract patterns, but it's
handy.

Anyway, the important thing is that RIGHT NOW we can write
*clearer*, more maintainable, programs using macros than we could
using multi-parameter type tests, so we are better off NOT
having multi-parameter type tests.