Use-defined guard (was: RE: Enhanced type guard syntax])

Fredrik Linder fredrik.linder@REDACTED
Tue Sep 23 14:52:58 CEST 2003


Yes, this is exactly what I wish for. :-)

Erlangers, is there any possibility that this will be added?

Cheers
/Fredrik

> -----Original Message-----
> From: Sven-Olof Nystr|m [mailto:svenolof@REDACTED]
> Sent: den 23 september 2003 11:18
> To: Sven-Olof Nystr|m
> Cc: Sean Hinde; Richard A. O'Keefe; erlang-questions@REDACTED;
> fredrik.linder@REDACTED
> Subject: Re: Use-defined guard (was: RE: Enhanced type guard syntax])
>
>
>  >  > I can't seem to find this proposal online. Link anyone?
>  >
>  > I have a copy. I could post it on the list, if Richard does not have
>  > any objections.
>
> See below.
>
> Sven-Olof
>
>
>
>
>
> Pattern Abstraction
>
> Richard O'Keefe
>
>
> There is one serious defect which Prolog has never overcome, although
> successor languages such as Mercury have addressed it.  It is a defect
> which languages such as SML, Miranda, Haskell, and Clean do not have,
> although it is shared by Scheme and Lisp generally.
>
> The defect is that Prolog has no support for abstract data types.  In
> fact, Prolog pretty well goes out of its way to discourage abstract data
> types.  Everything in Prolog is a term, and a term is either a variable
> or a functor applied to arguments which are themselves terms.  Not only
> are there primitive operations (var/1, nonvar/1, functor/3, arg/3) which
> can be used to compose and decompose any data structure whatever, so
> that abstraction cannot be enforced, Prolog's normal mode of dealing
> with data is pattern matching, which means that abstraction is
> penalised.
>
> The classic example is list processing.  One can quite easily set up an
> abstract data type list(T) with operations
>
>     null([]).
>
>     pair([H|T], H, T).
>
> and then write
>
>     append(X, Y, Y) :- null(X).
>     append(X, Y, Z) :- pair(X, H,U), pair(Z, H,V), append(U, Y, V).
>
> instead of
>
>     append([], Y, Y).
>     append([H|X1], Y, [H|Z1]) :- append(X1, Y, Z1).
>
> but then one has not merely the overhead of calling extra predicates,
> but the rather disastrous loss of first-argument indexing.  All Prolog
> really needs to solve this problem is a form of predicate definition
> that is guaranteed to be expanded inline, but this is not one of the
> matters that the ISO committee chose to address (not the least of the
> reasons why I have harsh words to say about that committee and that
> standard).  Another thing that would restore indexing would be to use
> negative clauses such as
>
>     <- null(L), pair(L, _, _).
>
> to declare that null/1 and pair/3 are mutually exclusive.  SB-Prolog
> has for a long time used such knowledge about built-in predicates
> (notably arithmetic comparison and term comparison) to allow a programmer
> to write e.g.
>
>     max(X, Y, X) :- X > Y.
>     max(X, Y, Y) :- X =< Y.
>
> and get mutual exclusion of clauses without cuts.  However, as Erlang
> has the typical functional ``commit to first successful match'' rule,
> it doesn't need negative clauses to get exclusion.
>
> Erlang has Prolog-like problems with abstract data types.  First, all
> Erlang data structures are terms, built from variables, constants,
> pairs, and tuples, and there are primitive features of the language for
> composing and decomposing terms, so it is quite impossible to enforce
> encapsulation, at least, not without imposing a type system.  Second,
> Erlang too normally operates on data structures by means of patterns,
> which literally make the structure of data visible.  Unlike Prolog,
> Erlang does not rely on first-argument (or any other) indexing, but it
> does not (and should not) allow general user-defined functions in
> guards, so something like
>
>     append(X, Y) when empty(X) -> Y;
>     append(X, Y) when {H,T} = pair(X) -> make_pair(H, append(T,Y)).
>
> would not be allowed.  There are good reasons for this, and I do not
> want to overturn this feature of Erlang.  But we do need something to
> put in its place if we are to support even weak data abstraction, and we
> need data abstraction very much if we are going to have 10MSLOC programs
> that we can hope to trust.  (I note that a large chunk of New Zealand
> was without emergency telephone service for several days in the week
> just ending, due to a software problem.  Probably written in C. (:-))
>
> Note that the disallowed append/2 code I showed above has a rather
> unpleasant property, which the inlined-simple-predicates approach in
> Prolog is thankfully free of:  composing a pair and decomposing a pair
> use different forms.  This increases the burden on the programmer.
> There is no symmetry at all that can be exploited to make the
> correspondence easier to remember.
>
> There is another major property which a good solution to the problem
> should have:  it should be fully compatible with an entire absence of
> any preprocessor.  At the end of this note I shall explain what is badly
> wrong with records in Erlang.
>
> The pattern abstraction proposal is quite simple.  There are three ways
> an abstract pattern can be used:
> - to compose a term
> - to decompose a term
> - to update a term (that is, compose a new term similar to an old one with
>    some differences.
> A pattern definition is basically a restricted kind of function
> definition, which is rewritten to define a composition function, a
> decomposition function, and an update function.  Each of the three ways
> of using an abstract pattern is defined by rewriting to a call to one of
> the three derived functions.
>
> Abstract patterns begin with a sharp, like records.  Unlike records,
> they use round parentheses instead of curly braces.  Also unlike
> records, they are executable code, not a kind of macro, so they may come
> from modules other than the lexically enclosing one.  A pattern
> definition is a restricted function definition whose heads are decorated
> with a sharp and whose bodies are patterns (possibly involving other
> abstract patterns).
>
>     definition +:= patclause {';' patclause}... '.'
>
>     patclause ::= '#' head ['when' guard] '->' pattern
>
>     pattern +:= '#' [module ':'] name '(' [patterns.] ')'
>
>     primary +:= '#' [module ':'] name '(' [expressions] ')'
>              |  primary '#' [module ':'] name '(' [expressions] ')'
>
>
> Here's append/2 using abstract patterns.
>
>     #empty() -> [].
>     #pair(H,T) -> [H|T].
>     #head(H) -> [H|_].
>     #tail(T) -> [_|T].
>
>     append1(#empty(), Y) -> Y;
>     append1(#pair(H,T), Y) -> #pair(H,append1(T,Y)).
>
>     append2(#empty(), Y) -> Y;
>     append2(#head(H)=#tail(T), Y) -> #pair(H,append2(T,Y)).
>
> What does this mean?  A pattern definition basically defines three
> functions:  a function to use for composition, a function to use for
> decomposition, and a function to use for update.  For the sake of
> exposition, let's name the composition function for #x/n '#x'/n, the
> decomposition function 'n=x'/1, and the update function '!x'/n+1.
>
>     '#empty'() -> [].      % composition function
>     '0=empty'([]) -> {}.   % decomposition function
>     '!empty'([]) -> [].    % update function
>
>     '#pair'(H,T) -> [H|T].
>     '2=pair'([H|T]) -> {H,T}.
>     '!pair'([_|_], H,T) -> [H|T].
>
>     '#head'(H) -> error(). % useless composition function
>     '1=head'([H|_]) -> H.
>     '!head([_|T], H) -> [H|T].
>
>     '#tail'(T) -> error(). % useless composition function
>     '1=tail'([_|T]) -> T.
>     '!tail'([H|_], T) -> [H|T].
>
>     append1(X, Y) when {} = '0=empty'(X) -> Y;
>     append1(X, Y) when {H,T} = '2=pair'(X) ->
>         '#pair'(H, append1(T,Y)).
>
>     append2(X, Y) when {} = '0=empty'(X) -> Y;
>     append2(X, Y) when H = '1=head'(X), T = '1=tail'(Y) ->
>         '#pair'(H, append2(T,Y)).
>
> After inline expansion, these two definitions would be identical to each
> other and to the usual definition.
>
> Suppose we are given a pattern definition
>
>     #p(P11,...,P1n) when G1 -> Q1;
>     ...
>     #p(Pk1,...,Pkn) when Gk -> Qk.
>
> The composition function is obtained very simply by turning #p into '#p'.
>
> The decomposition function basically switches the left and right sides:
>
>     'n=p'(Q1) when G1 -> {P11,...,P1n};
>     ...
>     'n=p'(Qk) when Gk -> {Pk1,...,Pkn}.
>
> However, for the sake of field selection notation, and for efficiency,
> the translation is simplified when n=1:
>
>     '1=p'(Q1) when G1 -> P11;
>     ...
>     '1=p'(Qk) when Gk -> Pk1.
>
> The update function has to copy the bits that aren't mentioned and
> replace the bits that are.  Define Q'i to be Qi with anonymous variables
> replaced by new variables, Q"i and G"i to be Q'i and Gi with the
> original named variables consistently replaced by new variables.  Then
> we get
>
>     '!p'(Q"1,P11,...,P1n) when G"1 -> Q'1;
>     ...
>     '!p'(Q"k,Pk1,...,Pkn) when G"k -> Q'k.
>
> All of these translations have to make sense.  In particular:
> 1.  every variable used in a guard must be defined in both the
>     argument patterns and the replacement pattern corresponding.
> 2a. If the composition function is to be usable, Qi must be fully
>     determined by Pi1...Pin and Gi, that is, have no variable,
>     anonymous or otherwise, that is not assigned a value by them.
> 2b. On the other hand, if an update function is to be useful, there
>     should be some anonymous variables in Qi, or else why not
>     just use a composition function?
> 3.  Pij must be fully determined by Qi and Gi, that is, have no variable,
>     anonymous or otherwise, that is not assigned a value by them.
>
> This actually turns out to be too strict.  There is no point in having
> an abstract pattern that can only be used for composition; we already
> have perfectly good functions for that.  But there is some sense in
> having an abstract pattern that can only be used for decomposition.
> Accordingly, if condition 2a is violated by the ith clause, that clause
> of the composition function becomes
>
>     '#p'(Pi1,...,Pin) when Gi -> error(...)
>
> When an abstract pattern is invoked as part of a pattern,
>
>     #[m:]p(P1,...,Pn) = E
>
> it is as if
>
>     {P1,...,Pn} = [m:]'n=p'(E)
>
> were invoked instead, if n~=1, or as if
>
>     P1 = [m:]'1=p'(E)
>
> were invoked, if n=1.  Since abstract patterns are state independent and
> have no side effects, it doesn't matter whether the translation
> preserves left to right order or not; suffice it that pattern match
> translation can preserve left to right order if we want.  If #p/1 is an
> abstract pattern, it may also be invoked as a field reference:
>
>     E0#p
>
> is as if
>
>     '1=p'(E0)
>
> had appeared instead.  This applies to guard expressions as well as
> general expressions.  Note that E#p is different from #p(E).
>
> When an abstract pattern is invoked as part of an expression,
>
>     #[m:]p(E1,...,En)
>
> it is as if
>
>     [m:]'#p'(E1,...,En)
>
> were invoked instead.  This clearly preserves the order and embedding of
> expressions, so left to right execution order is not at all disturbed.
>
> When an abstract pattern is invoked for update,
>
>     V = E0#[m:]p(E1,...,En)
>
> it is as if
>
>     V = [m:]'!p'(E0, E1,...,En)
>
> were invoked instead.  The only way this could perturb left to right
> ordering is if the module prefix involved an expression other than a
> variable or atom, and in that case the translation
>
>     (ET = E0, V = m:'!p'(ET, E1, ..., En)
>
> can be used (assuming that Erlang does or will allow arbitrary primaries
> as module prefixes) where ET is a new variable.  The preservation of
> left to right order is why the `base datum' argument of an update
> function is the first rather than the last.
>
> There is one further condition.  At the moment, if P is any pattern, and
> P = P makes sense, (mainly, P contains no unbound variables), we expect
> it to be true.  But consider
>
>     #p(1) -> [];
>     #p(2) -> [].
>
>     Y = #p(2), #p(2) = Y
>
> will fail, because it turns into
>
>     '#p'(1) -> []; '#p'(2) -> [].
>     '1=p'([]) -> 1; '1=p'([]) -> 2.
>     '!p'([],1) -> []; '!p'([],2) -> [].
>
>     Y = '#p'(2), 2 = '1=p'(Y)
>
> The conditions we need are easy to state
> 4. There does not exist any term T such that
>     Qi = T, Gi ==> true
>     Qj = T, Gj ==> true
>     i ~= j.
> 5. There do not exist any terms T1,...,Tn such that
>     Pi1 = T1, ..., Pin = Tn, Gi ==> true
>     Pj1 = T1, ..., Pjn = Tn, Gj ==> true
>     i ~= j
> but are less easy to check.  The example clearly violates condition 4.
> The difficulty of checking these conditions (a decision procedure for
> arithmetic appears to be required) would at first sight appear to be a
> problem, but in fact there are good reasons not to impose them,
>
> Now let's look at an example.
>
> ----------------------------------------------------------------
>                     Queue package example.
>
> We'll implement queues twice.
>
>     -module(queue).
>     -export([#empty/0,#nonempty/1,push/2,pop/1]).
>
>     #empty() -> [].
>
>     #nonempty() -> [_|_].  % violates condition 2
>
>     push(Q, X) -> append(Q, [X]).
>
>     pop([X|Q]) -> {X,Q}.
>
>     %end.
>
>
>     -module(queue).
>     -export([#empty/0,#nonempty/1,push/2,pop/2]).
>
>     #empty() -> {[],[]}.
>
>     #nonempty() -> {[_|_],_};   % violates condition 2
>     #nonempty() -> {[],[_|_]}.  % violates condition 2
>
>     push({L,R}, X) -> {L,[X|R]}.
>
>     pop({[X|L],R}) -> {X,{L,R}};
>     pop({[],R}) -> [X|L] = reverse(R), {X,L}.
>
>     %end.
>
>
>     -module(bfs).
>     -export(bfs/1).
>     -import(problem, [solved/1,children/1]).
>     -import(queue, [#empty/0,#nonempty/1,push/2,pop/1]).
>
>     bfs(Start) ->
>        loop(push(#empty(), Start)).
>
>     loop(#empty()) -> undefined;
>     loop(Q = #nonempty()) ->
>        {X,P} = top(Q),
>        case solved(X) of
>          true  -> X;
>          false -> loop(pushall(P, children(X)))
>        end.
>
>     pushall(Q, []) -> Q;
>     pushall(Q, [H|T]) -> pushall(push(Q,H), T).
>
>     %end.
>
>
> Here's the translation of the bfs module:
>
>     -module(bfs).
>     -export(bfs/1).
>
>     bfs(Start) ->
>        loop(queue:push(queue:'#empty'(), Start)).
>
>     loop(Q) when {} = queue:'0=empty'(Q) ->
>         undefined;
>     loop(Q) when {} = queue:'0=nonempty'(Q) ->
>        {X,P} = queue:pop(Q),
>        case problem:solved(X) of
>          true  -> X;
>          false -> loop(pushall(P, problem:children(X)))
>        end.
>
>     pushall(Q, []) -> Q;
>     pushall(Q, [H|T]) -> pushall(queue:push(Q,H), T).
>
>     %end.
>
> I have argued elsewhere that there should be two directives
>
>     -use_early(module, [used functors]).
>     -use_late(module, [used functors]).
>
> which indicate that specified functors from a specified module are used,
> the former indicating early binding, neither of them indicating that a
> name is imported into the local scope.  One reason for that was to
> capture the good aspect of the preprocessor, which is precisely early
> binding.  But note that if we used the preprocessor to hide the
> implementation of queues from the programmer while revealing it to the
> compiler, we would have to use early binding, while with abstract
> patterns, that's an independent choice.  The bfs module can use #empty()
> and #nonempty() as patterns without requiring early binding
>
> Note that replacing one version of the queue module by another while
> bfs/1 was running would not be a good idea.  This seems to be quite
> typical, and I do not understand how Erlang avoids serious trouble.  The
> problem is that once the bfs/1 function has received a data structure
> from the queue module, it depends on the queue module still accepting
> the same data structure, so the process running bfs/1 depends on the
> queue module even when it is not calling a function from that module.
> There really needs to be some sort of locking facility so that bfs/1
> could be written
>
>     bfs(start) ->
>        with_module_lock(queue,
>           loop(push(#empty(), Start))).
>
> However, if we are aware that there are long-lived processes that have
> data structures of the old form that need to be supported for a while,
> we might choose to write a queue module that supports both forms, while
> preferring the new one, and it is for this reason that abstract patterns
> may have more than one clause.  (Although the second version of the
> queue modules needs this anyway.)  Here goes:
>
>     -module(queue).
>     -export([#empty/0,#nonempty/1,push/2,pop/1]).
>
>     #empty() -> {[],[]};
>     #empty() -> [].        % violates condition 5
>
>     #nonempty() -> {[_|_],_};
>     #nonempty() -> {[],[_|_]};
>     #nonempty() -> [_|_].  % violates condition 5
>
>     push({L,R}, X) -> {L,[X|R]};
>     push(Q, X) -> append(Q, [X]).
>
>     pop({[X|L],R}) -> {X,{L,R}};
>     pop({[],R}) -> [X|L] = reverse(R), {X,L};
>     pop([X|Q]) -> {X,Q}.
>
>     %end.
>
> Try doing that with macros!
>
> Of course, macros provide early binding and inlining in one easy step,
> which may (but need not) be good for efficiency.  If an abstract pattern
> is defined in the module that uses it, or if -use_early is defined, then
> early binding occurs, and inlining is a possibility.  The option of
> inlining may be better than the certainty of inlining.  Why?
>
> The first, and perhaps most obvious, reason is that abstract patterns
> are named code forms with well defined invocation and contents, just
> like functions.  The fact that they can be used without requiring
> inlining is good news for debuggers and interpreters, not to mention
> test coverage analysers and profilers.
>
> The second is that inlining doesn't always help efficiency.  There's no
> point in executing 10% fewer instructions if, thanks to cache effects,
> it takes 20% longer to do so.  Back in the late 80s at Quintus, a
> certain competitor claimed that their native-code Prolog system
> generated code that ran twice as fast as ours.  So it did, on naive
> reverse.  On a two-page program (which didn't fit into the 68020's
> on-chip instruction cache), it did no better than our threaded-code
> system.  On a real program, it was twice as slow.  I have heard that the
> same phenomenon occurred with BEAM -vs- JAM.  Inlining can have similar
> effects, though less severe.
>
> One extension to Erlang would make it easier to inline abstract
> patterns, and that is disjunctive guards.  Currently, we have
>
>     guard ::= ... '(' guard ')'
>
> replace that by
>
>     guard ::= ... '(' guard {'|' guard}... ')'
>
> with the obvious semantics.  This would be useful anyway.
>
> ----------------------------------------------------------------
>                 What's wrong with records?
>
> Records tangle up several things:
> 1 the possibility of naming a pattern
> 2 field and  update notation.
> 3 keyword and default arguments
> 4 a very specific data structure
> 5 a global name space
> 6 the possibility of unchecked conflicting definitions in that namespace
> 7 a requirement to use the preprocessor to get weak abstraction
> 8 syntax that includes what LOOK like pattern matches but aren't.
>
> Some of these are good (1, 2, 3), some of them are less good (4), some
> of them are bad (5 and 8) and some of them are very bad (6 and 7).
> Pattern abstraction provides (1 and 2) without any of the others.
>
> What's wrong with (4)?  First, it means that records actually give us no
> abstraction at all; in the Erlang Standard specification, to be a record
> of type T is precisely to be a tuple of a particular arity whose first
> element is the atom T. Second, if we want to give a name to some other
> structure, we are sunk, as records buy us nothing.  Here is a realistic
> example.  Suppose we want to manipulate timestamps.  For compatibility
> with current Erlang, we might want these timestamps to contain embedded
> date and time values.  So
>
>     #timestamp()            -> {_,      _}.
>     #timestamp(DT,TM)       -> {DT,     TM}.
>     #timestamp(Y,M,D,H,I,S) -> {{Y,M,D},{H,I,S}}.
>
>     #time_date(DT)    -> {DT,_}.
>     #time_date(Y,M,D) -> {{Y,M,D},_}.
>
>     #time_time(TM)    -> {_,TM}.
>     #time_time(H,I,S) -> {_,{H,I,S}}.
>
>     #time_year(Y)   -> {{Y,_,_},_}.
>     #time_month(M)  -> {{_,M,_},_}.
>     #time_day(D)    -> {{_,_,D},_}.
>     #time_hour(H)   -> {_,{H,_,_}}.
>     #time_minute(I) -> {_,{_,I,_}}.
>     #time_second(S) -> {_,{_,_,S}}.
>
> Eventually, we purge our code of explicit references to the patterns,
> and can switch to a more space-efficient representation:
>
>     #timestamp()                 -> {_,_,_,_,_,_}.
>     #timestamp({Y,M,D},{H,I,S}}) -> {Y,M,D,H,I,S}.
>     #timestamp(Y,M,D,H,I,S)      -> {Y,M,D,H,I,S}.
>
>     #time_date({Y,M,D}) -> {Y,M,D,_,_,_}.
>     #time_date(Y, M, D) -> {Y,M,D,_,_,_}.
>
>     #time_time({H,I,S}) -> {_,_,_,H,I,S}.
>     #time_time(H, I, S) -> {_,_,_,H,I,S}.
>
>     #time_year(Y)   -> {Y,_,_,_,_,_}.
>     #time_month(M)  -> {_,M,_,_,_,_}.
>     #time_day(D)    -> {_,_,D,_,_,_}.
>     #time_hour(H)   -> {_,_,_,H,_,_}.
>     #time_minute(I) -> {_,_,_,_,I,_}.
>     #time_second(S) -> {_,_,_,_,_,S}.
>
> What do we have here?  A change of representation.  What is it that
> records do not allow you to vary?  The representation (except within
> very narrow limits).
>
> Just to resume a topic that may not have been clear before, here are the
> functions we get for the first of these definition sets.
>
>     '#timestamp'() -> error().
>     '0=timestamp'({_,_}) -> {}.
>     '!timestamp'({X1,X2}) -> {X1,X2}.
>
>     '#timestamp'(DT, TM) -> {DT, TM}.
>     '2=timestamp'({DT,TM}) -> {DT, TM}.
>     '!timestamp'({X1,X2}, DT, TM) -> {DT,TM}.
>
>     '#timestamp'(Y,M,D,H,I,S) -> {{Y,M,D},{H,I,S}}.
>     '6=timestamp'({{Y,M,D},{H,I,S}}) -> {Y,M,D,H,I,S}.
>     '!timestamp'({{X1,X2,X3},{X4,X5,X6}},Y,D,M,H,I,S) ->
> {{Y,M,D},{H,I,S}}.
>
>     '#time_date'(DT) -> error().
>     '1=time_date'({DT,_}) -> DT.
>     '!time_date'({X2,X1}, DT) -> {DT,X1}.
>
>     '#time_date'(Y,M,D) -> {{Y,M,D},_}.
>     '3=time_date'({{Y,M,D},_}) -> {Y,M,D}.
>     '!time_date'({X2,X1}, Y, M, D) -> {{Y,M,D},X1}.
>
>     '#time_time'(TM) -> error().
>     '1=time_time'({_,TM}) -> TM.
>     '!time_time'({X1,X2}, TM) -> {X1,TM}.
>
>     #time_time(H,I,S) -> error().
>     '3=time_time'({_,{H,I,S}}) -> {H,I,S}.
>     '!time_time'({X1,X2}, H, I, S) -> {X1,{H,I,S}}.
>
>     '#time_year'(Y) -> error().
>     '1=time_year'({{Y,_,_},_}) -> Y.
>     '!time_year'({X4,X1,X2},X3}) -> {{Y,X1,X2},X3}.
>
>     '#time_month'(M) -> error().
>     '1=time_month'({{_,M,_},_}) -> M.
>     '!time_month'({X1,X4,X2},X3}, M) -> {{X1,M,X2},X3}.
>
>     '#time_day'(D) -> error().
>     '1=time_day'({{_,_,D},_}) -> D.
>     '!time_day'({X1,X2,X4},X3}, D) -> {{X1,X2,D},X3}.
>
>     '#time_hour'(H) -> error().
>     '1=time_hour'({_,{H,_,_}}) -> H.
>     '!time_hour'({X1,{X4,X2,X3}}, H) -> {X1,{H,X2,X3}}.
>
>     '#time_minute'(I) -> error().
>     '1=time_minute'({_,{_,I,_}}) -> I.
>     '!time_minute'({X1,{X2,X4,X3}}, I) -> {X1,{X2,I,X3}}.
>
>     '#time_second'(S) -> error().
>     '1=time_second'({_,{_,_,S}}) -> S.
>     '!time_second'({X1,{X2,X3,X4}}, S) -> {X1,{X2,X3,S}}.
>
> So if, for example, we wanted to refer to noon today, or to find what
> year it is, we could write
>
>     Noon_Today = timestamp:now()#time_time(12,0,0),
>     This_Year = timestamp:now()#time_year
>
> and these would work whether a timestamp was stored as one tuple or
> three.  The -record approach cannot be used with the three tuple
> version.  (It can't be used with the one tuple version either, because
> there must be an atom serving as a tag, and it must be the first element
> of the tuple, and it must be the same as the record name.)
>
> What about keyword and default arguments (3)?  Are they not a benefit of
> using records?  Yes, they are.  But they should be available for all
> functions and abstract patterns, not just records.  Perhaps it is unfair
> to say ``ask an Ada programmer'', because Ada doesn't use data the way
> Erlang does.  But Ada programs do have lots of procedure and function
> calls, just as Erlang does, and Erlang functions need more arguments
> than Ada functions, for the same reasons as Prolog, so need keyword and
> default arguments more than Ada does.  I intend that abstract patterns
> should use whatever method is used to provide that feature for general
> functions.
>
> Do records really imply a global namespace?  Yes, because the record
> name appears inside the record.  Consider:
>
>     -module(a).
>     -record(r, {X,Y,Z}).
>     ...
>     f(..R..) when record(R, r) ->
>
>     -module(b).
>     -record(r, {X,Y}).
>     g(R) when record(R, r) -> f(R).
>     ... g(#r{1,2}) ...
>
> It is not clear exactly what the record/2 guard test should do.  One
> manual says "This test checks that R is a tuple of arity N+1, where N is
> the number of fields in the record, and the first element in the tuple
> is the" atom naming the record.  This means that g/1 WILL believe that
> record(R, r) is true, but f/1 will NOT.  That does mean that f/1 will not
> be tricked into working with something that doesn't match its idea of
> what an `r' record looks like, but programmers will find it very
> confusing that record(R,r) can succeed in one module and fail in another.
> It is also a nuisance that we cannot tell whether something "is a record
> of type r" without having a -record declaration in scope.  That doesn't
> apply to any other guard test.
>
> The real problem comes when a and b agree on the arity of r but disagree
> on the argument order or interpretation.  Consider again,
>
>     -module(a).
>     -record(r, {X,Y,Z}).
>     ...
>     f(...#r(X,Y,Z)...) -> ...
>
>     -module(b).
>     -record(r, {Z,Y,X}).
>     ...
>        f(...#r(Z,Y,X)...)
>
> Either a global namespace for records is enforced (so that the record/2
> guard test works consistently), or serious confusion may result.  The
> snag here is that a record name ought to be `owned' by some module, but
> in fact it is `owned' by a preprocessor file.
>
> How do pattern abstractions address this problem?  In part, it must be
> admitted, they don't. They are weak abstractions; there is no defence
> against forgery, and little defence against accident other than adding a
> tag, just like records.  The advantage comes from the fact that an
> abstract pattern is clearly owned by a specific module; there is one
> declaration, not multiple copies of a declaration.  Abstract pattern
> names are module qualified, just like other function names.
>
> There is something that could be done that would make accident unlikely
> and forgery slightly harder.  Records differ from tuples in that they
> are tagged.  The tags cannot be module-specific, because the same
> declaration may be copied into many modules, with no clue as to which is
> the owner.  All we do is take the idea of tagging, and bend the existing
> module-specific literal syntax (like ?VERSION and ?MODULE) a little bit.
> The directive
>
>     -unique(name).
>
> would mean that at load time a new ref() would be created and bound to
> name, in such a way that the name can be recovered from the ref() for
> debugging purposes, and at compile time each occurrence of ?name would
> refer to that ref(). (This is like the classic Lisp hack of using a
> unique cons cell or function closure as a tag.)  Then we could write
>
>     -unique(ts).
>
>     #timestamp()            -> {?ts,_,      _}.
>     #timestamp(DT,TM)       -> {?ts,DT,     TM}.
>     #timestamp(Y,M,D,H,I,S) -> {?ts,{Y,M,D},{H,I,S}}.
>
>     #time_date(DT)    -> {?ts,DT,_}.
>     #time_date(Y,M,D) -> {?ts,{Y,M,D},_}.
>
>     #time_time(TM)    -> {?ts,_,TM}.
>     #time_time(H,I,S) -> {?ts,_,{H,I,S}}.
>
>     #time_year(Y)   -> {?ts,{Y,_,_},_}.
>     #time_month(M)  -> {?ts,{_,M,_},_}.
>     #time_day(D)    -> {?ts,{_,_,D},_}.
>     #time_hour(H)   -> {?ts,_,{H,_,_}}.
>     #time_minute(I) -> {?ts,_,{_,I,_}}.
>     #time_second(S) -> {?ts,_,{_,_,S}}.
>
> This is admittedly clumsy, but it's the idea I want to push, not the
> syntax, which is at least local to the owning module.  This looks a bit
> like the preprocessor, but it only requires recognising ?<id> as a
> special form.  If we left the question mark off (and the binding to a
> ref()) we would have the same insecure approach as records, without the
> requirement of a fixed location for the tag.
>
> Thanks to preprocessor use, that insecurity is real.  The really nasty
> thing happens when module a gets its record declaration from foo.hrl,
> module a is compiled and loaded, foo.hrl is changed, and another module
> including (the revised version of) foo.hrl is compiled and loaded.  We
> have a dependency between module a and foo.hrl which is not noticed;
> partly because foo.hrl is not a module and is not, as such, loaded, and
> partly because the dependency exists even while the code in foo.hrl is
> not running.
>
> So we are left with only one thing that abstract patterns cannot do as
> well as records or better, than that is keyword and default parameters.
> At one time it was expected that records would become new types; this
> seems to have been dropped from the current Erlang Standard
> specification (0.6); I surmise because of the difficulty of establishing
> an `owner'.
>
> There is one more thing that abstract patterns can do that records as
> currently defined in Erlang cannot do, and that is enforce type
> constraints on components.  Consider triangles.  We'd like a triangle to
> be represented by a `record' containing three floats.  Using record
> notation, all we can do is write
>
>     -record(triangle{A,B,C}).
>
> and hope.  Using abstract pattern notation, we can write
>
>     #triangle(A,B,C) when float(A), float(B), float(C) -> {A,t,B,C}.
>
> and this will enforce the type constraints on the components.  Here is a
> sample use of this pattern, to see whether a value of this type is a
> plausible triangle.
>
>     plausible(#triangle(A,B,C))
>       when 0 < A, A <= B+C,
>            0 < B, B <= C+A,
>            0 < C, C <= A+B
>       -> true;
>     plausible(_) -> false.
>
> After translation and inline expansion, plausible/1 becomes
>
>     plausible({A,t,B,C})  % check form
>       when float(A), float(B), float(C), % check types
>            0 < A, A <= B+C,
>            0 < B, B <= C+A,
>            0 < C, C <= A+B
>       -> true;
>     plausible(_) -> false.
>
> This ability to propagate type information/checking from the declaration
> to the uses, if the programmer wants it, is an important safety feature
> that is completely missing from Erlang records, and is obtained without
> any new runtime machinery.  Note that an Erlang compiler can easily
> deduce from the use of A, B, C that they must be some kind of number,
> but not which kind, not without interprocedural data flow analysis.
> Even without inlining, we get the runtime type checking we choose in the
> declaration, which has to be helpful if we want to build large systems.
>
> The End.
>
> The abstract pattern notation neatly replaces both the record
> declarations and the use of the preprocessor.
>
> Records let you say
>
>     L2 = L#log{blocked_by = none, status = ok}
>
> which is arguably prettier than
>
>     L2 = L#log_blocked_by(none)#log_status(ok)
>
> but not hugely so.  One could argue that the abstract pattern
> version has the merit of not containing a manifestly false
> pattern match 'status = ok'.
>
> Records automatically provide you with a full set of record.field
> forms; abstract patterns for them have to be separately declared.
> On the other hand, records *force* you to accept a full set of
> record.field forms even if you want some of them blocked, and it
> is not at all clear how import/export of #record.field forms would
> be controlled.
>
> Records permit patterns like
>
>     #log{status == S, users = N} = L
>
> which is arguably prettier than
>
>     #log_status(S) = #log_users(N) = L
>
> but again, the latter never contains manifestly absurd matches.
>
> Dynamic Patterns.
>
> The parallel between abstract patterns and functions can be stretched a
> little further.  In effect, we can regard a value of type
>     (T1,...,Tn) => T0
> [where => is the pattern arrow in distinction to the function arrow ->] as
> a record
>     ( b : (T1,...,Tn) -> T0    % building
>       u : (T0,T1,...,Tn) -> T0    % updating
>       d : T0 -> (T1,...,Tn)    % decomposing
>     )
> and
>     X = #p(E1,...,En)       ==> X = p.b(E1,...,En)
>     X = E0#p(E1,...,En)   ==> X = p.u(E0,E1,...,En)
>     #p(P1,...,Pn) = E       ==> {P1,...,Pn} = p.d(E)
> Since a triple of functions can be passed as a value, why not allow a
> pattern to be passed as a value?  Let's put this in Erlang terms.
>
> Define
>     #E0(E1,...,En)
> where E0 is not an atom or a module:atom pair to be
>     #apply(E0, [E1,...,En])
> Define
>     #apply(Mod,E0,EL)
> to be
>     #apply({Mod,E0}, EL)
> Now we have to define the use of #apply/2.
>
> X = #apply(MF,L)
>     where MF = {M,F}, L = [T1,...,Tn}, and M, F are atoms
>     reduces to X = #M:F(T1,...,Tn), i.e. X = M:'#F'(T1,...,Tn).
>     where MF is the result of (fun #F/N) executed statically within
>     module M, L = [T1,...,Tn], and N = n,
>     reduces to X = #M:F(T1,...,Tn)
>
> X = E0#apply(MF,L)
>     where MF = {M,F}, L = {T1,...,Tn} and M, F are atoms
>     reduces to X = E0#M:F(T1,...,Tn) i.e. X = M:'!F'(E0,T1,...,Tn)
>     where MF is the result of (fun #F/N) executed statically within
>     module M, L = [T1,...,Tn], and N = n,
>     reduces to X = Eo#M:F(T1,...,Tn).
>
> #apply(MF, [P1,...,Pn]) = E
>     where MF is statically a GuardExpression whose value is dynamically
>     {M,F} for M and F atoms, and [P1,...,Pn] is statically a list
> of Patterns,
>     reduces to #M:F(P1,...,Pn) = E, i.e. {P1,..,Pn} = M:'n=F'(E).
>     where MF is statically a GuardExpression whose value is dynamically
>     the result of (fun #F/N) executed statically within a module M,
>     [P1,...,Pn] is statically a list of Patterns, and N = n,
>     reduces to #M:F(P1,...,Pn) = E.
>
> Now we can do things like this:
>
>     proj(P, []) -> [];
>     proj(P, [#P(X)|Xs[) -> [X|proj(P,Xs)].
>
>     #unit_list(X) -> [X].
>     #unit_tuple(X) -> {X}.
>
>     ... proj(fun #unit_list/1, L), ...
>     ... proj(fun #unit_tuple/1, LT), ...
>
> By bending explicit fun syntax a little bit, we can even get anonymous
> patterns:
>     fun #( Pattern... ) when Guard -> Pattern
>       ; #( Pattern... ) when Guard -> Pattern ...
>     end
> e.g.
>     fun #(X) -> [X|_] end
> is an anonymous `hd' pattern.  (The syntax here is based on two rules:
> to get pattern syntax from function syntax, put # in front of the
> [Module:]Name, and to get explicit fun syntax, put `fun' `end' around
> the clauses and delete the function Name.)  It's not that I think
> anonymous patterns are wildly useful; the point is that if they weren't
> at least thinkable the parallels between patterns and functions would be
> sploit, to the detriment of our ability to reason correctly about them.
>
> Try doing that with records!
>
> As a matter of fact, dynamic patterns do something for us that isn't
> easy to get any other way.  Look at prof/2 again.  Is there any
> nonsyntactic difference between
>
>     proj(P, []) -> [];
>     proj(P, [#P(X)|Xs]) -> [X|proj(P, Xs)].
>
> and
>
>     map(F, []) -> [];
>     map(F, [X|Xs]) -> [F(X)|map(F, Xs)].
>
> or is it just a piece of `cuteness' that lets you put the function call
> on the `wrong' side?
>
> Yes, there is a difference.  The function F in map/2 may do anything at
> all; it may have side effects, it may allocate arbitrarily large amounts
> of storage, it may run for huge amounts of time.  The pattern P in
> proj/2 must be an abstraction of a pattern-and-guard:  it may not send
> or receive messages, have any other side effects, allocate huge amounts
> of storage, or take unbounded amounts of time.  It is forced to be well
> behaved in ways that an arbitary function parameter is not.  Indeed, we
> are sure that the value X it returns in proj/2 is part of the term it
> matches, so it really is a projection function.  If we want to reason
> about the effects of a code fragment, proj/2 is significantly easier to
> reason about than map/2.  This means that anonymous patterns do have a
> use:
>
>     proj(fun #(X) -> {X,_,_} end, L)
>
> is trivially well-behaved, whereas
>
>     map(fun ({X,_,_}) -> X end, L)
>
> requires more analysis to see its harmlessness.  (Of course, checking
> higher order functions is never entirely trivial, and it is something of
> a pity that they were added to Erlang so hastily.  Topic for another
> working paper.)
>
> When you've invented a language construct, a good sign that you are onto
> something worthwhile is that applications come faster than you have time
> to write them down.  When I wrote the stuff about dynamic abstract
> patterns, I didn't have an application in mind.  But I suddently realised
> that there was a proposal I thought of last year which had been
> languishing
> because there was no safe way to do it.
>
> Erlang processes are vulnerable to a denial-of-service attack where a
> malicious or erroneous process repeatedly sends "junk mail".  The Erlang
> documentation recommends that most or all 'receive' commands should
> include a wildcard case to discard junk mail, but that has the
> disadvantage
> that it won't discard junk mail until the process actually executes a
> receive.  Junk mail can pile up while a process is suspended.  If nothing
> else, it causes the recipient of the junk mail to pay (in processing and
> garbage collection time) to inspect it.
>
> Last year I came up with the idea of a 'message filter'.
>
>     set_process_filter(fun (Msg) -> ... true | false end)
>
> would mean that whenever a (local) process send a message to this one,
> the message filter function would run IN THE SENDER'S THREAD to see if
> the message was ok.  If it returned 'false', the message would be
> discarded, otherwise it would be retained.
>
> The problem with that is that it makes the sender vulnerable to Trojan
> horses; the message filter function could do absolutely anything while
> masquerading as the sender.  What I needed was a way of declaring and
> enforcing that the message filter function would execute in bounded time
> with no side effects.
>
> Does that sound familiar?  It's basically an abstract pattern.
>
> So here's the new proposal.
>
>     set_process_filter(fun #() when Guard -> Pattern end)
>
>         If the argument is a pattern expression of arity 0, then
>         when a message is sent to this process, the pattern will
>         be executed in the sender's thread and if it matches the
>         message will be sent and if it fails the message will be
>         discarded.
>
>     set_process_filter(fun #(Result) when Guard -> Pattern end)
>
>         If the argument is a pattern expression of arity 1, then
>         when a message is sent to this process, the pattern will
>         be executed in the sender's thread, and if it matches,
>         the Result will be sent, and if it fails, the message
>         will be discarded.
>
> Of course these forms are only illustrative; there could be any number
> of clauses and fun #F/0 or fun #F/1 could be used instead.  The use of
> set_process_filter/1 would be to allow simple protocol conversions.
>
> Whether this feature is as useful as I think is debatable; Tobbe
> argues that the best idea is to conceal processes inside their creating
> modules, and I think he's probably right about that.  The point is that
> dynamic functions whose cost is certainly bounded are a useful thing to
> have, and they fall out quite naturally from trying to take abstract
> patterns seriously.
>
>




More information about the erlang-questions mailing list