[eeps] Multi-Parameter Typechecking BIFs

Thu Feb 26 00:52:51 CET 2009

Let's take a look at e3d_vec.erl

Since all the vectors we're dealing with are 3d vectors,
there's no point in giving them any name but #vector.

We'll also use the name #scalar for the scalars.

#scalar(X) when is_float(X) -> X.

Since
#vector(X, Y, Z) when #scalar(X), #scalar(Y), #scalar(Z) -> {X,Y,Z}.
#vector({X,Y,Z}) -> #vector(X, Y, Z).

#zero() -> #vector(0.0, 0.0, 0.0).

zero() -> #zero().

is_zero(#zero()) -> true;
is_zero(_)       -> false.

Now we have a puzzle.  add/2 checks that the
elements of the first triple are floats, but
does not check that the elements of the second
triple are floats.  I cannot tell whether this
is an oversight or not.  I've assumed it is.

add(#vector(X1,Y1,Z1), #vector(X2,Y2,Z2)) ->
     #vector(X1+X2, Y1+Y2, Z1+Z2).

Another puzzle.  add_prod/3 checks the number
we're scaling by, but not the triples.  Again
I'm assuming that's an oversight.

add_prod(#vector(X1,Y1,Z1), #vector(X1,Y2,Z2), #scalar(S)) ->
     #vector(S*X2+X1, S*Y2+Y1, S*Z2+Z1).

add([#vector(X1,Y1,Z1)|Vectors]) ->
     add(Vectors, X1, Y1, Z1);
add([]) -> #e3d_zero().

%  add/4 is a vector list sum unrolled by 3.

add([#vector(X1, Y1, Z1)
     ,#vector(X2, Y2, Z2)
     ,#vector(X3, Y3, Z3)
     |Vectors],
     #scalar(X0), #scalar(Y0), #scalar(Z0)
) ->
     add(Vectors, X0+X1+X2+X3, Y0+Y1+Y2+Y3, Z0+Z1+Z2+Z3);
add([#vector(X1, Y1, Z1), #vector(X2, Y2, Z2)],
     #scalar(X0), #scalar(Y0), #scalar(Z0)
) ->
     #vector(X0+X1+X2, Y0+Y1+Y2, Z0+Z1+Z2);
add([#vector(X1, Y1, Z1)],
     #scalar(X0), #scalar(Y0), #scalar(Z0)
) ->
     #vector(X0+X1, Y0+Y1, Z0+Z1);
add([],
     #scalar(X0), #scalar(Y0), #scalar(Z0)
) ->
     #vector(X0, Y0, Z0).

Another puzzle.  add/2 did some checking,
but sub/2 did none.  Again I assume an oversight.

sub(#vector(X1,Y1,Z1), #vector(X2,Y2,Z2)) ->
     #vector(X1-X2, Y1-Y2, Z1-Z2).

It's not clear to me why there is a sub/1.
As I've read the code, it does the Lispy
(first thing - sum of everything else)
bit, not the APLish alternating sum.  So

sub([Vector|Vectors]) ->
     sub(Vector, add(Vectors)).

Here's another fact about abstract patterns.
An abstract pattern can faithfully describe the
concept "vector", but not the concept "list of
vectors", because the latter requires recursion.

I've eliminated norm/4 as it doesn't seem to be
exported, and every call to it repeats work that
is already there in norm/3.

norm(#vector(X,Y,Z)) ->
     norm(X, Y, Z).

norm(#scalar(X), #scalar(Y), #scalar(Z)) ->
     Squared_Length = X*X + Y*Y + Z*Z,
     if Squared_Length >= 1.0e-16 ->
        L = math:sqrt(Squared_Length),
        try #vector(X/L, Y/L, Z/L)
        catch error:badarith -> #e3d_zero()
        end
      ; Squared_Length < 1.0e-16 -> #e3d_zero()
     end.

Another puzzle.  norm_sub/2 checks the elements
of the first triple, but not the second triple.

norm_sub(#vector(X1,Y1,Z1), #vector(X2,Y2,Z2)) ->
     norm(X1-X2, Y1-Y2, Z1-Z2).

Puzzle time again.  mul/2 checks the scale factor
but not the vector.

mul(#vector(X,Y,Z), #scalar(S)) ->
     #vector(X*S, Y*S, Z*S).

There's a numeric puzzle in divide/2.  It goes
out of its way to incur twice the roundoff error
that it has to incur.  I'll reproduce that, but
it's hard to see the point.

divide(#vector(X,Y,Z), #scalar(S)) ->
     R = 1.0/S, % (R)eciprocal of S
     #vector(X*S, Y*S, Z*S).

dot(#vector(X1,Y1,Z1), #vector(X2,Y2,Z2)) ->
     X1*X2 + Y1*Y2 + Z1*Z2.

Puzzle: dot didn't check its arguments at all.
cross checked them thoroughly.

cross(#vector(X1,Y1,Z2), #vector(X2,Y2,Z2)) ->
     #vector(Y1*Z2 - Z1*Y2, Z1*X2 - X1*Z2, X1*Y2 - Y1*X2).

len(#vector(X,Y,Z)) ->
     math:sqrt(X*X + Y*Y + Z*Z).

dist(#vector(X1,Y1,Z1), #vector(X2,Y2,Z2)) ->
     X = X1 - X2,
     Y = Y1 - Y2,
     Z = Z1 - Z2,
     math:sqrt(X*X + Y*Y + Z*Z).

normal/3
    At this point my ability to read the existing code gave out.
    The reason is not the is_float tests; they don't bother me
    one teeny tiny bit.  The reason is the naming convention.
    X1   Y1   Z1   X2  Y2  Z2   X3  Y3  Z3   are called
    V10  V11  V12  V20 V21 V22  V30 V31 V32
    and the use of 0 origin for one subscript (x,y,z)
    and 1 origin for the other (vector 1, vector 2, vector 3)
    in the same variable names blows fuses I didn't know I had.
    However, the thing _starts_ with

    normal(#vector(X1,Y1,Z1), #vector(X2,Y2,Z2), #vector(X3,Y3,Z3))

    and ends with me noticing that the end looks a whole lot like

         norm(Xn, Yn, Zn)

    except that it doesn't include the comparison against 1.0e-16.

area/3 runs into the same variable name problem.  It's not just
    me that finds X, Y, Z intuitive; some of the code in this module
    already uses X, Y, Z rather than 0, 1, 2.

normal/3 uses yet another convention.  Instead of
    X1, Y2, Z3 (my preferred choice) or
    V10, V21, V32 (used in area/3),
    normal/3 uses Ax, By, Cz.
    All I can really grasp is that the code would have been shorter
    and no less efficient with calls to norm/3 instead of norm/4.

In average/1, what is the reason for
     V0 = if
              V10 =:= V20 -> V10;
              is_float(V10) -> 0.5*(V10+V20)
          end,
rather than
     V0 = (V10+V20)*0.5
The spec says that we are dealing exclusively with triples
of floats, so in IEEE arithmetic we _must_ get the same
answer either way, except in the case where V10+V20 overflows.
But the code elsewhere takes no precautions against overflow.

bounding_box([#vector(X,Y,Z)|Vectors]) ->
     bounding_box_loop(Vectors, X, X, Y, Y, Z, Z).

bounding_box_loop(L=[#vector(X,_,_)|_], Xlo,Xhi, Ylo,Yhi, Zlo,Zhi)
   when X < Xlo -> bounding_box_loop(L,  X,  Xhi, Ylo,Yhi, Zlo,Zhi);
bounding_box_loop(L=[#vector(X,_,_)|_], Xlo,Xhi, Ylo,Yhi, Zlo,Zhi)
   when X > Xhi -> bounding_box_loop(L,  Xlo,X,   Ylo,Yhi, Zlo,Zhi);
bounding_box_loop(L=[#vector(_,Y,_)|_], Xlo,Xhi, Ylo,Yhi, Zlo,Zhi)
   when Y < Ylo -> bounding_box_loop(L,  Xlo,Xhi, Y,  Yhi, Zlo,Zhi);
bounding_box_loop(L=[#vector(_,Y,_)|_], Xlo,Xhi, Ylo,Yhi, Zlo,Zhi)
   when Y > Yhi -> bounding_box_loop(L,  Xlo,Xhi, Ylo,Y,   Zlo,Zhi);
bounding_box_loop(L=[#vector(_,Y,_)|_], Xlo,Xhi, Ylo,Yhi, Zlo,Zhi)
   when Z < Zlo -> bounding_box_loop(L,  Xlo,Xhi, Ylo,Yhi, Zlo,Zhi);
bounding_box_loop(L=[#vector(_,Y,_)|_], Xlo,Xhi, Ylo,Yhi, Z,  Zhi)
   when Z > Zhi -> bounding_box_loop(L,  Xlo,Xhi, Ylo,Yhi, Zlo,Z  );
bounding_box_loop([_|L],                Xlo,Xhi, Ylo,Yhi, Zlo,Zhi)
                -> bounding_box_loop(L,  Xlo,Xhi, Ylo,Yhi, Zlo,Zhi);
bounding_box_loop([],                   Xlo,Xhi, Ylo,Yhi, Zlo,Zhi) ->
     {#vector(Xlo,Ylo,Zlo), #vector(Xhi,Yhi,Zhi)}.

What we really need here is maximum and minimum operators built
into core Erlang.  The OTP sources are littered with them; there
are at least 20 copies of max/2 and nearly as many of min/2.
One reason why it matters is that max(X,Y) where X and Y are
both floats can be implemented in just two instructions on most
modern machines:
	[having got X into R1 and Y into R2]
	compare R1 with R2			[compare]
	if R1 was less than R2 move R2 to R1	[conditional move]

Wouldn't it be nice to write

bounding_box_loop([#vector(X,Y,Z)|Vectors], Xlo,Xhi, Ylo,Yhi, Zlo,Zhi)  
->
     bounding_box_loop(Vectors,
	X /\ Xlo, X \/ Xhi,
	Y /\ Ylo, Y \/ Yhi,
	Z /\ Zlo, Z \/ Zhi);
bounding_box_loop([], Xlo,Xhi, Ylo,Yhi, Zlo,Zhi) ->
     {#vector(Xlo,Ylo,Zlo), #vector(Xhi,Yhi,Zhi)}.

and know it was efficient?

Note that the code above uses #vector and #scalar patterns.
	#vector(X, Y, Z)
is much less repetitive than
	{X::float, Y::float, Z::float}
and
	#scalar(S)
is more intention-revealing than
	S::float
(What matters is not that S is a floating point number but
  that it is an element of the scalars that the vectors are
  vectors over.)

Note also that in
	{{A,B,C},{D,E,F}}
we have curly braces being used for two different things,
whereas in
	{#vector(A,B,C), #vector(D,E,F)}
we have braces meaning "pair" and #vector where we mean vector.

Note further that this satisfies Mats Cronqvist's desire for a
notation that puts types in the patterns.  If you really want
to write
	#float(X) when float(X) -> X.
then nothing stops you writing
	#float(X)
if you really want to.  The big difference is that _anything_
that can be expressed by some combination of pattern matching
and guard tests can be given a name and used in a pattern.

This is where we want to go.
Why take any detours on the way?