[eeps] Multi-Parameter Typechecking BIFs

Thu Feb 26 04:11:43 CET 2009

On 26 Feb 2009, at 12:49 am, Andras Georgy Bekes wrote:
>
>> In this case, by golly, we have FIVE proposals on the table, not two.
> Please also consider a proposal #5 written by somebody, which is the
> ability to use #2 syntax with user defined types.

At the time I wrote that, I had not seen an example of
proposal 5) so couldn't comment.

> Example:
>
> -type(i3_point() :: {integer(),integer(),integer()}).
>
> foo(_, {Width,Height,Depth}::i3_point, _)
>
> I think either #4 or #5 is the best.
> Could you help comparing them?
>
> #5 user-defined types used as type guard:
> + you don't have to write any extra declaration. You have written the
> type declaration for dialyzer, now you use that for one more thing.
> - As in the above example, if you want to match the parts of the  
> whole,
> you have to write (know!) the structure, i.e. that i3_point is a
> 3-tuple and the order of elements is some fixed order.

Let's see what these can do.
I'm using EEP 8 as my source for what a -type can be.
[Drat, cut and paste mangled the layout.]
any()
none()
pid()
port()
ref()
[]		nil, not any list
atom()
specific atom
binary()
<<>>
<<_:#>>		in these three, # is an integer
<<_:_*#>>
<<_:#*#>>
integer()
specific integer
specific integer .. specific integer
float()
list(Type)	abbreviated to [Type]
		[Type,...] is nonempty proper lists of Type
improper_list(Type1, Type2)
maybe_improper_list(Type1, Type2)
tuple()
{}		empty tuple
{Types}
fun()
fun((...) -> Type)
fun(() -> Type)
fun((Types) -> Type)
Type1 | Type2	union
term()		= any()
bool()		false | true
byte()		0..255
char()		0..16#10ffff
non_neg_integer()	0..
pos_integer()	1..
neg_integer()	..-1
number()	integer() | float()
list()		list(any())
maybe_improper_list()	maybe_improper_list(any(),any())
maybe_improper_list(T)	maybe_improper_list(T, any())
string()	[char()]
nonempty_string() [char(),...]
iolist()	phew!
module()	atom()
mfa()		{atom(),atom(),byte()}
node()		atom()
timeout()	infinity | non_negative_integer()
no_return()	none()
#Atom{}
#Atom{Fields}

Although list.. and iolist are recursive,
"general recursive types cannot be defined".
Also, user-defined type declarations can be parameterised.

What follows from this?

First, that allowing all of these and user-defined types as
guard tests would be really cool.  Especially, allowing
user-defined type tests in guards would completely answer
my objection about type tests being very very low level.

Second, that the type language is not yet sufficiently
expressive for me to express things that I take for granted
in SML and Clean and Haskell, even though it is in some
ways more powerful.  If I want to mention particular numbers
in Haskell types, I have to encode numbers *as* types.

Third, that there are three things that this type language
can do that abstract patterns cannot do.

	A) list() and its relatives are not expressible as
	   abstract patterns, because abstract patterns do
	   not allow recursion.

	   OK, I lied.
	   #list(L) when length(L) >= 0 -> L.
	   precisely expresses list().  But that's because
	   of length(), which I'd like to see removed from
	   guards, not because of abstract patterns per se.

	   Absent length(), any abstract pattern I can write
	   will take constant time, which is precisely what
	   I want for something that's safe to use in guards,
	   especially in 'receive'.

	B) Abstract patterns do not have type or pattern
	   parameters (although if the double parameterisation
	   scheme in the last stage were adopted, they would
	   accept pattern parameters).

	C) They can classify functions more finely.
	   Abstract patterns can only use the information that
	   is available from pattern matching and guards;
	   while function arity is available, argument and
	   result types are not.

What do we conclude from this?

	A') Power comes at a price.  ::type tests could be
	    extremely expensive, just as length(_) calculations
	    in guards can be extremely expensive.  The thought
	    of a function that keeps on saying L :: iolist()
	    gives me the performance tar pit horrors.

	B') I like this very much.  It's enough to make me think
	    kindly of double parameterisation for abstract patterns.

	C') I'm afraid this makes user-defined type tests
	    unimplementable, at least ones that mention function
	    types.

Note that testing function types could involve us in the problem
of calculating, at run time, whether one type is a subset of another.

What can abstract patterns do that these types cannot?

	D) They can be used to generate values as well as to
	   recognise them.

	E) They can represent floating point constants and ranges.

	F) They can recognise lists and tuples of variable but
	   bounded size:

	   #one_to_fifty(T)
	   when is_tuple(T), tuple_size(T) >= 1, tuple_size(T) =< 50
	   -> T.

	   You _can_ write that as a user-defined type, but do you
	   really want to?

	G) They can do moderate amounts of arithmetic.

	H) They can refer to self().

	I) They can take a term apart and return the pieces;
	   they are not limited to simply recognising something.
>
>
> #4 abstract patterns:
> - You write extra declaration.
> Abstract patterns are new to the language
> and complementary to the current type declarations. You must write  
> both
> the abstract pattern and the type declaration.

I think there is a misconception here, and it may well
strike to the heart of the thread.

-type declarations are designed to express *TYPES*.
That is, those constraints on arguments that should *NOT*
be checked at run time.

Abstract patterns are designed to express *PATTERNS*.
That is, those constraints on arguments that *should*
be checked at run time.

The fact that -type declarations can express *some*
recursively defined sets of terms is just perfect for
the job of declaring types.  It's not good for making
run-time decisions.

The fact that abstract patterns can't do recursion at
all makes them a good choice for something to use in
selecting which of several clauses to use.  It's not
good for defining *all* the legal arguments of some
function, not always.

You would NOT expect to write two declarations, one
-type declaration and one abstract pattern.  If a
-type declaration contains N union operators (|)
you would expect to write N+2 declarations.

Here's what I mean.

	-type rbtree(K,V) :: empty
                            | {red,K,V,rbtree(K,V),rbtree(K,V)}
		           | {black,K,V,rbtree(K,V),rbtree(K,V)}.

By the way, if you read "::" as "has type", which is
how you read it in a -record declaration or a -spec,
this is absurd.  rbtree(K,V) doesn't "have type" the
right hand side, it "names" or "is" the right hand
side.  As I understand it this isn't a legal -type
declaration anyway because of the recursion.  But let's
pretend.

So this there correspond three abstract patterns:

	#empty() -> empty.
	#red(K,V,L,R) -> {red,K,V,L,R}.
	#black(K,V,L,R) -> {black,K,V,L,R}.

and maybe some more

Code using the abstract patterns would look like

	-spec lookup(Key :: K, Dict :: rbtree(K,V), Dfl :: V) -> V.

	lookup(K, #empty(),       D) -> D;
	lookup(K, #red(K1,V,L,R), D) ->
	    if K < K1 -> lookup(K, L, D)
	     ; K > K1 -> lookup(K, R, D)
	     ; true   -> V
	    end;
	lookup(K, #black(K1,V,L,R), D) ->
	    if K < K1 -> lookup(K, L, D)
	     ; K > K1 -> lookup(K, R, D)
	     ; true   -> V
	    end.

Leave aside the question of whether the code could be
improved.  I know it could.

The thing here is that the type and the abstract patterns
are doing completely different jobs.  The type says what
_all_ the possibilities are, the patterns say what the
_currently relevant_ possibilities are.

Now suppose we broke down the types.

	-type rb_empty(K,V) :: 'empty'.
	-type rb_red(K,V)   :: {'red',  K,V,rb_tree(K,V),rb_tree(K,V)}.
	-type rb_black(K,V) :: {'black',K,V,rb_tree(K,V),rb_tree(K,V)}.
	-type rb_tree(K,V)  :: rb_empty(K,V) | rb_red(K,V) | rb_black(K,V).
000000000011111111112222222222333333333344444444445555555555666666666677777

Then we might imagine doing

	lookup(K, empty :: rb_empty(_,_), D) -> D;
	lookup(K, {_,K1,V,L,R} :: rb_red(_,_), D) ->
	    ... as before
	lookup(K, {_,K1,V,L,R} :: rb_black(_,_), D) ->
	    ... as before

The type tests do something for us, but in order to
take the matching terms apart, WE HAVE TO DO THE PATTERN MATCHING
OURSELVES ANYWAY.

The confusion between types and patterns, to which I confess
that some of my examples have contributed, arises when we
consider functions with only one clause, due to what would
have been algebraic data types with only one constructor.

The 3d vectors and matrices in Wings3D are precisely such
cases.  If there is only one shape that values of a type
may have, then -types and #abstract patterns *do* resemble
one another strongly.

Just as -type declarations currently support records, so
they could be made to support (some) abstract patterns.
We could, for example, imagine defining

	#empty() -> empty.
	#red(  K,V,L,R) -> {red,  K,V,L,R}.
	#black(K,V,L,R) -> {black,K,V,L,R}.

	-type rbtree(K,V)
	   :: #empty()
             | #red(  K, V, rbtree(K,V), rbtree(K,V))
	    | #black(K, V, rbtree(K,V), rbtree(K,V)).

Here the type and the patterns are each playing their
proper role:  the type describes ALL the possibilities,
and the patterns say how to classify them into cases.

(I remind you once again that rbtree(K,V) is a recursively
defined type and EEP 8 says that such types aren't yet allowed.
This is just for the sake of exposition.)

>
> However, abstract patterns might be automatically generated from type
> declarations or the other way round. This is a step in a bad  
> direction.
> We should not have two different things for describing the same.

But they AREN'T the same.
At the moment, types are *exclusively* a compile-time concept,
and abstract patterns a run-time concept.

The overlap comes only from the idea of using user-defined
types as type *tests*, turning them into a run-time concept.

But user-defined types as run-time type tests just don't
work very well.  Their main advantage over abstract patterns
in that role, that they can have type parameters, doesn't
actually materialise, because Erlang functions don't have
type parameters that you can pass along.  Look at this
example again:

	lookup(K, empty        :: rb_empty(_,_), D) -> D;
	lookup(K, {_,K1,V,L,R} :: rb_red(_,_),   D) ->
	    ... as before
	lookup(K, {_,K1,V,L,R} :: rb_black(_,_), D) ->
	    ... as before

Why did I put underscores in the type tests?  Because there
is nothing else I could possibly put there.
>

> We also have a #6 proposal:
>
>>> take advantage of the newly
>>> introduced -spec syntax and making it possible
>>> to tell the compiler to generate runtime guards according to the
>>> -spec

That's basically the same as number 5.

Let me flesh out a point I made earlier.

-spec flatten(L :: iolist()) -> string().
-spec flatten(L :: iolist(), S :: string()) -> string().

flatten(L) -> flatten(L, "").

flatten([], S) -> S;
flatten([H|T], S) -> flatten(H, flatten(T, S));
flatten(C, S) when is_integer(C) -> [C|S];
flatten(B, S) when is_binary(B) -> binary_to_list(B) ++ S.

As a compile time specification, those are perfect specs
for those functions.  They DON'T eliminate the need for
is_integer and is_binary to check which clause to use.

Now suppose that the compiler is told to generate runtime tests.
This is equivalent to getting the compiler to write

flatten(L :: iolist()) -> flatten(L, "").

flatten([] :: iolist(), S :: string()) -> S;
flatten([H|T] :: iolist(), S :: string()) ->
	flatten(H, flatten(T, S));
flatten(C :: iolist(), S :: string()) when is_integer(C) ->
	[C | S];
flatten(B :: iolist(), S :: string()) when is_binary(B) ->
	binary_to_list(B) ++ S.

We can imagine a smart compiler being able to optimise
Term :: Type when it can see the form of Term or there is
a type check handy in a guard.  So this would simplify to

flatten(L :: iolist()) -> flatten(L, "").

flatten([], S :: string()) -> S;
flatten([H :: iolist() | T :: iolist()], S :: string()) ->
	flatten(H, flatten(T, S));
flatten(C, S :: string()) when is_integer(C), C >= 0, C =< 16#10FFFF ->
	[C|S];
flatten(B, S :: string()) when is_binary(B) ->
	binary_to_list(B) ++ S.

We can even imagine a smart compiler optimising the type checking
of S, if flatten/2 happens not to be exported.  So we're down to

flatten(L :: iolist()) -> flatten(L, "").

flatten([], S) -> S;
flatten([H :: iolist() | T :: iolist()], S) ->
	flatten(H, flatten(T, S));
flatten(C, S) when is_integer(C), C >= 0, C =< 16#10FFFF ->
	[C|S];
flatten(B, S) when is_binary(B) ->
	binary_to_list(B) ++ S.

Optimisation can go no further without changing the semantics.
A call to flatten([1,2,3,4,5,x]) would, without the compiler-
inserted type checks, crash when the programmed recursion
reached the 'x'.  But *with* the compiler-inserted type checks,
it must crash at the very first call.

The result is that instead of flatten being linear in the size
of the iolist, it's at least quadratic.

To put it mildly, this is not a good idea.

> Richard's opinion is:
>> It is conventional to use type declarations to enable
>> a compiler to *omit* run-time checks.  (Lisp, ML, CAML, F#,
>> Haskell, Clean, ...)  Using them to cause the compiler to
>> *emit* run-time checks is certainly original.
>
> If I specify the type of a function, I want it to be executed only if
> the type is correct. I don't care if the compiler *emits* extra type
> checks or it *omits* the type checks beacuse it is able to deduce that
> my type declaration is never violated.

Since the run time type checks take time,
and can take a very great deal of time,
you OUGHT to care whether the compiler is emitting them
or omitting them.

-type and -spec declarations were invented to
document, and check at compile time the behaviour of a whole function.

abstract patterns were invented to simplify the
run time selection of one of many clauses.

>
>
>> Not the least
>> of the problems is that code that worked as its author
>> intended (but was incorrectly annotated) would suddenly break.
> Yes, this is a real problem, but note that we're talking about code  
> that
> dialyzer told you will break. If you didn't use dialyzer before, you
> certainly didn't write the spec declarations. If you did use dialyzer,
> you most probably did read what it tells you.

Actually, no.  Remember, the Dialyzer is not complete.
(For excellent and clearly explained reasons.)
Turning -specs into run time checks CAN raise exceptions
that the Dialyzer would not have warned about.

Here's another example of performance cost.

-type matrix = list(list(float()).
-spec nrows(M :: matrix()) -> non_neg_integer().
nrows(M) -> length(M).

Suppose I have an PxQ matrix.
Computing nrows of it takes O(P) time.
Checking that it is a list of list of floats takes O(PxQ) time.

What makes a good *type* doesn't necessarily make a good *test*.