Use-defined guard (was: RE: Enhanced type guard syntax])
Fredrik Linder
fredrik.linder@REDACTED
Tue Sep 23 14:52:58 CEST 2003
Yes, this is exactly what I wish for. :-)
Erlangers, is there any possibility that this will be added?
Cheers
/Fredrik
> -----Original Message-----
> From: Sven-Olof Nystr|m [mailto:svenolof@REDACTED]
> Sent: den 23 september 2003 11:18
> To: Sven-Olof Nystr|m
> Cc: Sean Hinde; Richard A. O'Keefe; erlang-questions@REDACTED;
> fredrik.linder@REDACTED
> Subject: Re: Use-defined guard (was: RE: Enhanced type guard syntax])
>
>
> > > I can't seem to find this proposal online. Link anyone?
> >
> > I have a copy. I could post it on the list, if Richard does not have
> > any objections.
>
> See below.
>
> Sven-Olof
>
>
>
>
>
> Pattern Abstraction
>
> Richard O'Keefe
>
>
> There is one serious defect which Prolog has never overcome, although
> successor languages such as Mercury have addressed it. It is a defect
> which languages such as SML, Miranda, Haskell, and Clean do not have,
> although it is shared by Scheme and Lisp generally.
>
> The defect is that Prolog has no support for abstract data types. In
> fact, Prolog pretty well goes out of its way to discourage abstract data
> types. Everything in Prolog is a term, and a term is either a variable
> or a functor applied to arguments which are themselves terms. Not only
> are there primitive operations (var/1, nonvar/1, functor/3, arg/3) which
> can be used to compose and decompose any data structure whatever, so
> that abstraction cannot be enforced, Prolog's normal mode of dealing
> with data is pattern matching, which means that abstraction is
> penalised.
>
> The classic example is list processing. One can quite easily set up an
> abstract data type list(T) with operations
>
> null([]).
>
> pair([H|T], H, T).
>
> and then write
>
> append(X, Y, Y) :- null(X).
> append(X, Y, Z) :- pair(X, H,U), pair(Z, H,V), append(U, Y, V).
>
> instead of
>
> append([], Y, Y).
> append([H|X1], Y, [H|Z1]) :- append(X1, Y, Z1).
>
> but then one has not merely the overhead of calling extra predicates,
> but the rather disastrous loss of first-argument indexing. All Prolog
> really needs to solve this problem is a form of predicate definition
> that is guaranteed to be expanded inline, but this is not one of the
> matters that the ISO committee chose to address (not the least of the
> reasons why I have harsh words to say about that committee and that
> standard). Another thing that would restore indexing would be to use
> negative clauses such as
>
> <- null(L), pair(L, _, _).
>
> to declare that null/1 and pair/3 are mutually exclusive. SB-Prolog
> has for a long time used such knowledge about built-in predicates
> (notably arithmetic comparison and term comparison) to allow a programmer
> to write e.g.
>
> max(X, Y, X) :- X > Y.
> max(X, Y, Y) :- X =< Y.
>
> and get mutual exclusion of clauses without cuts. However, as Erlang
> has the typical functional ``commit to first successful match'' rule,
> it doesn't need negative clauses to get exclusion.
>
> Erlang has Prolog-like problems with abstract data types. First, all
> Erlang data structures are terms, built from variables, constants,
> pairs, and tuples, and there are primitive features of the language for
> composing and decomposing terms, so it is quite impossible to enforce
> encapsulation, at least, not without imposing a type system. Second,
> Erlang too normally operates on data structures by means of patterns,
> which literally make the structure of data visible. Unlike Prolog,
> Erlang does not rely on first-argument (or any other) indexing, but it
> does not (and should not) allow general user-defined functions in
> guards, so something like
>
> append(X, Y) when empty(X) -> Y;
> append(X, Y) when {H,T} = pair(X) -> make_pair(H, append(T,Y)).
>
> would not be allowed. There are good reasons for this, and I do not
> want to overturn this feature of Erlang. But we do need something to
> put in its place if we are to support even weak data abstraction, and we
> need data abstraction very much if we are going to have 10MSLOC programs
> that we can hope to trust. (I note that a large chunk of New Zealand
> was without emergency telephone service for several days in the week
> just ending, due to a software problem. Probably written in C. (:-))
>
> Note that the disallowed append/2 code I showed above has a rather
> unpleasant property, which the inlined-simple-predicates approach in
> Prolog is thankfully free of: composing a pair and decomposing a pair
> use different forms. This increases the burden on the programmer.
> There is no symmetry at all that can be exploited to make the
> correspondence easier to remember.
>
> There is another major property which a good solution to the problem
> should have: it should be fully compatible with an entire absence of
> any preprocessor. At the end of this note I shall explain what is badly
> wrong with records in Erlang.
>
> The pattern abstraction proposal is quite simple. There are three ways
> an abstract pattern can be used:
> - to compose a term
> - to decompose a term
> - to update a term (that is, compose a new term similar to an old one with
> some differences.
> A pattern definition is basically a restricted kind of function
> definition, which is rewritten to define a composition function, a
> decomposition function, and an update function. Each of the three ways
> of using an abstract pattern is defined by rewriting to a call to one of
> the three derived functions.
>
> Abstract patterns begin with a sharp, like records. Unlike records,
> they use round parentheses instead of curly braces. Also unlike
> records, they are executable code, not a kind of macro, so they may come
> from modules other than the lexically enclosing one. A pattern
> definition is a restricted function definition whose heads are decorated
> with a sharp and whose bodies are patterns (possibly involving other
> abstract patterns).
>
> definition +:= patclause {';' patclause}... '.'
>
> patclause ::= '#' head ['when' guard] '->' pattern
>
> pattern +:= '#' [module ':'] name '(' [patterns.] ')'
>
> primary +:= '#' [module ':'] name '(' [expressions] ')'
> | primary '#' [module ':'] name '(' [expressions] ')'
>
>
> Here's append/2 using abstract patterns.
>
> #empty() -> [].
> #pair(H,T) -> [H|T].
> #head(H) -> [H|_].
> #tail(T) -> [_|T].
>
> append1(#empty(), Y) -> Y;
> append1(#pair(H,T), Y) -> #pair(H,append1(T,Y)).
>
> append2(#empty(), Y) -> Y;
> append2(#head(H)=#tail(T), Y) -> #pair(H,append2(T,Y)).
>
> What does this mean? A pattern definition basically defines three
> functions: a function to use for composition, a function to use for
> decomposition, and a function to use for update. For the sake of
> exposition, let's name the composition function for #x/n '#x'/n, the
> decomposition function 'n=x'/1, and the update function '!x'/n+1.
>
> '#empty'() -> []. % composition function
> '0=empty'([]) -> {}. % decomposition function
> '!empty'([]) -> []. % update function
>
> '#pair'(H,T) -> [H|T].
> '2=pair'([H|T]) -> {H,T}.
> '!pair'([_|_], H,T) -> [H|T].
>
> '#head'(H) -> error(). % useless composition function
> '1=head'([H|_]) -> H.
> '!head([_|T], H) -> [H|T].
>
> '#tail'(T) -> error(). % useless composition function
> '1=tail'([_|T]) -> T.
> '!tail'([H|_], T) -> [H|T].
>
> append1(X, Y) when {} = '0=empty'(X) -> Y;
> append1(X, Y) when {H,T} = '2=pair'(X) ->
> '#pair'(H, append1(T,Y)).
>
> append2(X, Y) when {} = '0=empty'(X) -> Y;
> append2(X, Y) when H = '1=head'(X), T = '1=tail'(Y) ->
> '#pair'(H, append2(T,Y)).
>
> After inline expansion, these two definitions would be identical to each
> other and to the usual definition.
>
> Suppose we are given a pattern definition
>
> #p(P11,...,P1n) when G1 -> Q1;
> ...
> #p(Pk1,...,Pkn) when Gk -> Qk.
>
> The composition function is obtained very simply by turning #p into '#p'.
>
> The decomposition function basically switches the left and right sides:
>
> 'n=p'(Q1) when G1 -> {P11,...,P1n};
> ...
> 'n=p'(Qk) when Gk -> {Pk1,...,Pkn}.
>
> However, for the sake of field selection notation, and for efficiency,
> the translation is simplified when n=1:
>
> '1=p'(Q1) when G1 -> P11;
> ...
> '1=p'(Qk) when Gk -> Pk1.
>
> The update function has to copy the bits that aren't mentioned and
> replace the bits that are. Define Q'i to be Qi with anonymous variables
> replaced by new variables, Q"i and G"i to be Q'i and Gi with the
> original named variables consistently replaced by new variables. Then
> we get
>
> '!p'(Q"1,P11,...,P1n) when G"1 -> Q'1;
> ...
> '!p'(Q"k,Pk1,...,Pkn) when G"k -> Q'k.
>
> All of these translations have to make sense. In particular:
> 1. every variable used in a guard must be defined in both the
> argument patterns and the replacement pattern corresponding.
> 2a. If the composition function is to be usable, Qi must be fully
> determined by Pi1...Pin and Gi, that is, have no variable,
> anonymous or otherwise, that is not assigned a value by them.
> 2b. On the other hand, if an update function is to be useful, there
> should be some anonymous variables in Qi, or else why not
> just use a composition function?
> 3. Pij must be fully determined by Qi and Gi, that is, have no variable,
> anonymous or otherwise, that is not assigned a value by them.
>
> This actually turns out to be too strict. There is no point in having
> an abstract pattern that can only be used for composition; we already
> have perfectly good functions for that. But there is some sense in
> having an abstract pattern that can only be used for decomposition.
> Accordingly, if condition 2a is violated by the ith clause, that clause
> of the composition function becomes
>
> '#p'(Pi1,...,Pin) when Gi -> error(...)
>
> When an abstract pattern is invoked as part of a pattern,
>
> #[m:]p(P1,...,Pn) = E
>
> it is as if
>
> {P1,...,Pn} = [m:]'n=p'(E)
>
> were invoked instead, if n~=1, or as if
>
> P1 = [m:]'1=p'(E)
>
> were invoked, if n=1. Since abstract patterns are state independent and
> have no side effects, it doesn't matter whether the translation
> preserves left to right order or not; suffice it that pattern match
> translation can preserve left to right order if we want. If #p/1 is an
> abstract pattern, it may also be invoked as a field reference:
>
> E0#p
>
> is as if
>
> '1=p'(E0)
>
> had appeared instead. This applies to guard expressions as well as
> general expressions. Note that E#p is different from #p(E).
>
> When an abstract pattern is invoked as part of an expression,
>
> #[m:]p(E1,...,En)
>
> it is as if
>
> [m:]'#p'(E1,...,En)
>
> were invoked instead. This clearly preserves the order and embedding of
> expressions, so left to right execution order is not at all disturbed.
>
> When an abstract pattern is invoked for update,
>
> V = E0#[m:]p(E1,...,En)
>
> it is as if
>
> V = [m:]'!p'(E0, E1,...,En)
>
> were invoked instead. The only way this could perturb left to right
> ordering is if the module prefix involved an expression other than a
> variable or atom, and in that case the translation
>
> (ET = E0, V = m:'!p'(ET, E1, ..., En)
>
> can be used (assuming that Erlang does or will allow arbitrary primaries
> as module prefixes) where ET is a new variable. The preservation of
> left to right order is why the `base datum' argument of an update
> function is the first rather than the last.
>
> There is one further condition. At the moment, if P is any pattern, and
> P = P makes sense, (mainly, P contains no unbound variables), we expect
> it to be true. But consider
>
> #p(1) -> [];
> #p(2) -> [].
>
> Y = #p(2), #p(2) = Y
>
> will fail, because it turns into
>
> '#p'(1) -> []; '#p'(2) -> [].
> '1=p'([]) -> 1; '1=p'([]) -> 2.
> '!p'([],1) -> []; '!p'([],2) -> [].
>
> Y = '#p'(2), 2 = '1=p'(Y)
>
> The conditions we need are easy to state
> 4. There does not exist any term T such that
> Qi = T, Gi ==> true
> Qj = T, Gj ==> true
> i ~= j.
> 5. There do not exist any terms T1,...,Tn such that
> Pi1 = T1, ..., Pin = Tn, Gi ==> true
> Pj1 = T1, ..., Pjn = Tn, Gj ==> true
> i ~= j
> but are less easy to check. The example clearly violates condition 4.
> The difficulty of checking these conditions (a decision procedure for
> arithmetic appears to be required) would at first sight appear to be a
> problem, but in fact there are good reasons not to impose them,
>
> Now let's look at an example.
>
> ----------------------------------------------------------------
> Queue package example.
>
> We'll implement queues twice.
>
> -module(queue).
> -export([#empty/0,#nonempty/1,push/2,pop/1]).
>
> #empty() -> [].
>
> #nonempty() -> [_|_]. % violates condition 2
>
> push(Q, X) -> append(Q, [X]).
>
> pop([X|Q]) -> {X,Q}.
>
> %end.
>
>
> -module(queue).
> -export([#empty/0,#nonempty/1,push/2,pop/2]).
>
> #empty() -> {[],[]}.
>
> #nonempty() -> {[_|_],_}; % violates condition 2
> #nonempty() -> {[],[_|_]}. % violates condition 2
>
> push({L,R}, X) -> {L,[X|R]}.
>
> pop({[X|L],R}) -> {X,{L,R}};
> pop({[],R}) -> [X|L] = reverse(R), {X,L}.
>
> %end.
>
>
> -module(bfs).
> -export(bfs/1).
> -import(problem, [solved/1,children/1]).
> -import(queue, [#empty/0,#nonempty/1,push/2,pop/1]).
>
> bfs(Start) ->
> loop(push(#empty(), Start)).
>
> loop(#empty()) -> undefined;
> loop(Q = #nonempty()) ->
> {X,P} = top(Q),
> case solved(X) of
> true -> X;
> false -> loop(pushall(P, children(X)))
> end.
>
> pushall(Q, []) -> Q;
> pushall(Q, [H|T]) -> pushall(push(Q,H), T).
>
> %end.
>
>
> Here's the translation of the bfs module:
>
> -module(bfs).
> -export(bfs/1).
>
> bfs(Start) ->
> loop(queue:push(queue:'#empty'(), Start)).
>
> loop(Q) when {} = queue:'0=empty'(Q) ->
> undefined;
> loop(Q) when {} = queue:'0=nonempty'(Q) ->
> {X,P} = queue:pop(Q),
> case problem:solved(X) of
> true -> X;
> false -> loop(pushall(P, problem:children(X)))
> end.
>
> pushall(Q, []) -> Q;
> pushall(Q, [H|T]) -> pushall(queue:push(Q,H), T).
>
> %end.
>
> I have argued elsewhere that there should be two directives
>
> -use_early(module, [used functors]).
> -use_late(module, [used functors]).
>
> which indicate that specified functors from a specified module are used,
> the former indicating early binding, neither of them indicating that a
> name is imported into the local scope. One reason for that was to
> capture the good aspect of the preprocessor, which is precisely early
> binding. But note that if we used the preprocessor to hide the
> implementation of queues from the programmer while revealing it to the
> compiler, we would have to use early binding, while with abstract
> patterns, that's an independent choice. The bfs module can use #empty()
> and #nonempty() as patterns without requiring early binding
>
> Note that replacing one version of the queue module by another while
> bfs/1 was running would not be a good idea. This seems to be quite
> typical, and I do not understand how Erlang avoids serious trouble. The
> problem is that once the bfs/1 function has received a data structure
> from the queue module, it depends on the queue module still accepting
> the same data structure, so the process running bfs/1 depends on the
> queue module even when it is not calling a function from that module.
> There really needs to be some sort of locking facility so that bfs/1
> could be written
>
> bfs(start) ->
> with_module_lock(queue,
> loop(push(#empty(), Start))).
>
> However, if we are aware that there are long-lived processes that have
> data structures of the old form that need to be supported for a while,
> we might choose to write a queue module that supports both forms, while
> preferring the new one, and it is for this reason that abstract patterns
> may have more than one clause. (Although the second version of the
> queue modules needs this anyway.) Here goes:
>
> -module(queue).
> -export([#empty/0,#nonempty/1,push/2,pop/1]).
>
> #empty() -> {[],[]};
> #empty() -> []. % violates condition 5
>
> #nonempty() -> {[_|_],_};
> #nonempty() -> {[],[_|_]};
> #nonempty() -> [_|_]. % violates condition 5
>
> push({L,R}, X) -> {L,[X|R]};
> push(Q, X) -> append(Q, [X]).
>
> pop({[X|L],R}) -> {X,{L,R}};
> pop({[],R}) -> [X|L] = reverse(R), {X,L};
> pop([X|Q]) -> {X,Q}.
>
> %end.
>
> Try doing that with macros!
>
> Of course, macros provide early binding and inlining in one easy step,
> which may (but need not) be good for efficiency. If an abstract pattern
> is defined in the module that uses it, or if -use_early is defined, then
> early binding occurs, and inlining is a possibility. The option of
> inlining may be better than the certainty of inlining. Why?
>
> The first, and perhaps most obvious, reason is that abstract patterns
> are named code forms with well defined invocation and contents, just
> like functions. The fact that they can be used without requiring
> inlining is good news for debuggers and interpreters, not to mention
> test coverage analysers and profilers.
>
> The second is that inlining doesn't always help efficiency. There's no
> point in executing 10% fewer instructions if, thanks to cache effects,
> it takes 20% longer to do so. Back in the late 80s at Quintus, a
> certain competitor claimed that their native-code Prolog system
> generated code that ran twice as fast as ours. So it did, on naive
> reverse. On a two-page program (which didn't fit into the 68020's
> on-chip instruction cache), it did no better than our threaded-code
> system. On a real program, it was twice as slow. I have heard that the
> same phenomenon occurred with BEAM -vs- JAM. Inlining can have similar
> effects, though less severe.
>
> One extension to Erlang would make it easier to inline abstract
> patterns, and that is disjunctive guards. Currently, we have
>
> guard ::= ... '(' guard ')'
>
> replace that by
>
> guard ::= ... '(' guard {'|' guard}... ')'
>
> with the obvious semantics. This would be useful anyway.
>
> ----------------------------------------------------------------
> What's wrong with records?
>
> Records tangle up several things:
> 1 the possibility of naming a pattern
> 2 field and update notation.
> 3 keyword and default arguments
> 4 a very specific data structure
> 5 a global name space
> 6 the possibility of unchecked conflicting definitions in that namespace
> 7 a requirement to use the preprocessor to get weak abstraction
> 8 syntax that includes what LOOK like pattern matches but aren't.
>
> Some of these are good (1, 2, 3), some of them are less good (4), some
> of them are bad (5 and 8) and some of them are very bad (6 and 7).
> Pattern abstraction provides (1 and 2) without any of the others.
>
> What's wrong with (4)? First, it means that records actually give us no
> abstraction at all; in the Erlang Standard specification, to be a record
> of type T is precisely to be a tuple of a particular arity whose first
> element is the atom T. Second, if we want to give a name to some other
> structure, we are sunk, as records buy us nothing. Here is a realistic
> example. Suppose we want to manipulate timestamps. For compatibility
> with current Erlang, we might want these timestamps to contain embedded
> date and time values. So
>
> #timestamp() -> {_, _}.
> #timestamp(DT,TM) -> {DT, TM}.
> #timestamp(Y,M,D,H,I,S) -> {{Y,M,D},{H,I,S}}.
>
> #time_date(DT) -> {DT,_}.
> #time_date(Y,M,D) -> {{Y,M,D},_}.
>
> #time_time(TM) -> {_,TM}.
> #time_time(H,I,S) -> {_,{H,I,S}}.
>
> #time_year(Y) -> {{Y,_,_},_}.
> #time_month(M) -> {{_,M,_},_}.
> #time_day(D) -> {{_,_,D},_}.
> #time_hour(H) -> {_,{H,_,_}}.
> #time_minute(I) -> {_,{_,I,_}}.
> #time_second(S) -> {_,{_,_,S}}.
>
> Eventually, we purge our code of explicit references to the patterns,
> and can switch to a more space-efficient representation:
>
> #timestamp() -> {_,_,_,_,_,_}.
> #timestamp({Y,M,D},{H,I,S}}) -> {Y,M,D,H,I,S}.
> #timestamp(Y,M,D,H,I,S) -> {Y,M,D,H,I,S}.
>
> #time_date({Y,M,D}) -> {Y,M,D,_,_,_}.
> #time_date(Y, M, D) -> {Y,M,D,_,_,_}.
>
> #time_time({H,I,S}) -> {_,_,_,H,I,S}.
> #time_time(H, I, S) -> {_,_,_,H,I,S}.
>
> #time_year(Y) -> {Y,_,_,_,_,_}.
> #time_month(M) -> {_,M,_,_,_,_}.
> #time_day(D) -> {_,_,D,_,_,_}.
> #time_hour(H) -> {_,_,_,H,_,_}.
> #time_minute(I) -> {_,_,_,_,I,_}.
> #time_second(S) -> {_,_,_,_,_,S}.
>
> What do we have here? A change of representation. What is it that
> records do not allow you to vary? The representation (except within
> very narrow limits).
>
> Just to resume a topic that may not have been clear before, here are the
> functions we get for the first of these definition sets.
>
> '#timestamp'() -> error().
> '0=timestamp'({_,_}) -> {}.
> '!timestamp'({X1,X2}) -> {X1,X2}.
>
> '#timestamp'(DT, TM) -> {DT, TM}.
> '2=timestamp'({DT,TM}) -> {DT, TM}.
> '!timestamp'({X1,X2}, DT, TM) -> {DT,TM}.
>
> '#timestamp'(Y,M,D,H,I,S) -> {{Y,M,D},{H,I,S}}.
> '6=timestamp'({{Y,M,D},{H,I,S}}) -> {Y,M,D,H,I,S}.
> '!timestamp'({{X1,X2,X3},{X4,X5,X6}},Y,D,M,H,I,S) ->
> {{Y,M,D},{H,I,S}}.
>
> '#time_date'(DT) -> error().
> '1=time_date'({DT,_}) -> DT.
> '!time_date'({X2,X1}, DT) -> {DT,X1}.
>
> '#time_date'(Y,M,D) -> {{Y,M,D},_}.
> '3=time_date'({{Y,M,D},_}) -> {Y,M,D}.
> '!time_date'({X2,X1}, Y, M, D) -> {{Y,M,D},X1}.
>
> '#time_time'(TM) -> error().
> '1=time_time'({_,TM}) -> TM.
> '!time_time'({X1,X2}, TM) -> {X1,TM}.
>
> #time_time(H,I,S) -> error().
> '3=time_time'({_,{H,I,S}}) -> {H,I,S}.
> '!time_time'({X1,X2}, H, I, S) -> {X1,{H,I,S}}.
>
> '#time_year'(Y) -> error().
> '1=time_year'({{Y,_,_},_}) -> Y.
> '!time_year'({X4,X1,X2},X3}) -> {{Y,X1,X2},X3}.
>
> '#time_month'(M) -> error().
> '1=time_month'({{_,M,_},_}) -> M.
> '!time_month'({X1,X4,X2},X3}, M) -> {{X1,M,X2},X3}.
>
> '#time_day'(D) -> error().
> '1=time_day'({{_,_,D},_}) -> D.
> '!time_day'({X1,X2,X4},X3}, D) -> {{X1,X2,D},X3}.
>
> '#time_hour'(H) -> error().
> '1=time_hour'({_,{H,_,_}}) -> H.
> '!time_hour'({X1,{X4,X2,X3}}, H) -> {X1,{H,X2,X3}}.
>
> '#time_minute'(I) -> error().
> '1=time_minute'({_,{_,I,_}}) -> I.
> '!time_minute'({X1,{X2,X4,X3}}, I) -> {X1,{X2,I,X3}}.
>
> '#time_second'(S) -> error().
> '1=time_second'({_,{_,_,S}}) -> S.
> '!time_second'({X1,{X2,X3,X4}}, S) -> {X1,{X2,X3,S}}.
>
> So if, for example, we wanted to refer to noon today, or to find what
> year it is, we could write
>
> Noon_Today = timestamp:now()#time_time(12,0,0),
> This_Year = timestamp:now()#time_year
>
> and these would work whether a timestamp was stored as one tuple or
> three. The -record approach cannot be used with the three tuple
> version. (It can't be used with the one tuple version either, because
> there must be an atom serving as a tag, and it must be the first element
> of the tuple, and it must be the same as the record name.)
>
> What about keyword and default arguments (3)? Are they not a benefit of
> using records? Yes, they are. But they should be available for all
> functions and abstract patterns, not just records. Perhaps it is unfair
> to say ``ask an Ada programmer'', because Ada doesn't use data the way
> Erlang does. But Ada programs do have lots of procedure and function
> calls, just as Erlang does, and Erlang functions need more arguments
> than Ada functions, for the same reasons as Prolog, so need keyword and
> default arguments more than Ada does. I intend that abstract patterns
> should use whatever method is used to provide that feature for general
> functions.
>
> Do records really imply a global namespace? Yes, because the record
> name appears inside the record. Consider:
>
> -module(a).
> -record(r, {X,Y,Z}).
> ...
> f(..R..) when record(R, r) ->
>
> -module(b).
> -record(r, {X,Y}).
> g(R) when record(R, r) -> f(R).
> ... g(#r{1,2}) ...
>
> It is not clear exactly what the record/2 guard test should do. One
> manual says "This test checks that R is a tuple of arity N+1, where N is
> the number of fields in the record, and the first element in the tuple
> is the" atom naming the record. This means that g/1 WILL believe that
> record(R, r) is true, but f/1 will NOT. That does mean that f/1 will not
> be tricked into working with something that doesn't match its idea of
> what an `r' record looks like, but programmers will find it very
> confusing that record(R,r) can succeed in one module and fail in another.
> It is also a nuisance that we cannot tell whether something "is a record
> of type r" without having a -record declaration in scope. That doesn't
> apply to any other guard test.
>
> The real problem comes when a and b agree on the arity of r but disagree
> on the argument order or interpretation. Consider again,
>
> -module(a).
> -record(r, {X,Y,Z}).
> ...
> f(...#r(X,Y,Z)...) -> ...
>
> -module(b).
> -record(r, {Z,Y,X}).
> ...
> f(...#r(Z,Y,X)...)
>
> Either a global namespace for records is enforced (so that the record/2
> guard test works consistently), or serious confusion may result. The
> snag here is that a record name ought to be `owned' by some module, but
> in fact it is `owned' by a preprocessor file.
>
> How do pattern abstractions address this problem? In part, it must be
> admitted, they don't. They are weak abstractions; there is no defence
> against forgery, and little defence against accident other than adding a
> tag, just like records. The advantage comes from the fact that an
> abstract pattern is clearly owned by a specific module; there is one
> declaration, not multiple copies of a declaration. Abstract pattern
> names are module qualified, just like other function names.
>
> There is something that could be done that would make accident unlikely
> and forgery slightly harder. Records differ from tuples in that they
> are tagged. The tags cannot be module-specific, because the same
> declaration may be copied into many modules, with no clue as to which is
> the owner. All we do is take the idea of tagging, and bend the existing
> module-specific literal syntax (like ?VERSION and ?MODULE) a little bit.
> The directive
>
> -unique(name).
>
> would mean that at load time a new ref() would be created and bound to
> name, in such a way that the name can be recovered from the ref() for
> debugging purposes, and at compile time each occurrence of ?name would
> refer to that ref(). (This is like the classic Lisp hack of using a
> unique cons cell or function closure as a tag.) Then we could write
>
> -unique(ts).
>
> #timestamp() -> {?ts,_, _}.
> #timestamp(DT,TM) -> {?ts,DT, TM}.
> #timestamp(Y,M,D,H,I,S) -> {?ts,{Y,M,D},{H,I,S}}.
>
> #time_date(DT) -> {?ts,DT,_}.
> #time_date(Y,M,D) -> {?ts,{Y,M,D},_}.
>
> #time_time(TM) -> {?ts,_,TM}.
> #time_time(H,I,S) -> {?ts,_,{H,I,S}}.
>
> #time_year(Y) -> {?ts,{Y,_,_},_}.
> #time_month(M) -> {?ts,{_,M,_},_}.
> #time_day(D) -> {?ts,{_,_,D},_}.
> #time_hour(H) -> {?ts,_,{H,_,_}}.
> #time_minute(I) -> {?ts,_,{_,I,_}}.
> #time_second(S) -> {?ts,_,{_,_,S}}.
>
> This is admittedly clumsy, but it's the idea I want to push, not the
> syntax, which is at least local to the owning module. This looks a bit
> like the preprocessor, but it only requires recognising ?<id> as a
> special form. If we left the question mark off (and the binding to a
> ref()) we would have the same insecure approach as records, without the
> requirement of a fixed location for the tag.
>
> Thanks to preprocessor use, that insecurity is real. The really nasty
> thing happens when module a gets its record declaration from foo.hrl,
> module a is compiled and loaded, foo.hrl is changed, and another module
> including (the revised version of) foo.hrl is compiled and loaded. We
> have a dependency between module a and foo.hrl which is not noticed;
> partly because foo.hrl is not a module and is not, as such, loaded, and
> partly because the dependency exists even while the code in foo.hrl is
> not running.
>
> So we are left with only one thing that abstract patterns cannot do as
> well as records or better, than that is keyword and default parameters.
> At one time it was expected that records would become new types; this
> seems to have been dropped from the current Erlang Standard
> specification (0.6); I surmise because of the difficulty of establishing
> an `owner'.
>
> There is one more thing that abstract patterns can do that records as
> currently defined in Erlang cannot do, and that is enforce type
> constraints on components. Consider triangles. We'd like a triangle to
> be represented by a `record' containing three floats. Using record
> notation, all we can do is write
>
> -record(triangle{A,B,C}).
>
> and hope. Using abstract pattern notation, we can write
>
> #triangle(A,B,C) when float(A), float(B), float(C) -> {A,t,B,C}.
>
> and this will enforce the type constraints on the components. Here is a
> sample use of this pattern, to see whether a value of this type is a
> plausible triangle.
>
> plausible(#triangle(A,B,C))
> when 0 < A, A <= B+C,
> 0 < B, B <= C+A,
> 0 < C, C <= A+B
> -> true;
> plausible(_) -> false.
>
> After translation and inline expansion, plausible/1 becomes
>
> plausible({A,t,B,C}) % check form
> when float(A), float(B), float(C), % check types
> 0 < A, A <= B+C,
> 0 < B, B <= C+A,
> 0 < C, C <= A+B
> -> true;
> plausible(_) -> false.
>
> This ability to propagate type information/checking from the declaration
> to the uses, if the programmer wants it, is an important safety feature
> that is completely missing from Erlang records, and is obtained without
> any new runtime machinery. Note that an Erlang compiler can easily
> deduce from the use of A, B, C that they must be some kind of number,
> but not which kind, not without interprocedural data flow analysis.
> Even without inlining, we get the runtime type checking we choose in the
> declaration, which has to be helpful if we want to build large systems.
>
> The End.
>
> The abstract pattern notation neatly replaces both the record
> declarations and the use of the preprocessor.
>
> Records let you say
>
> L2 = L#log{blocked_by = none, status = ok}
>
> which is arguably prettier than
>
> L2 = L#log_blocked_by(none)#log_status(ok)
>
> but not hugely so. One could argue that the abstract pattern
> version has the merit of not containing a manifestly false
> pattern match 'status = ok'.
>
> Records automatically provide you with a full set of record.field
> forms; abstract patterns for them have to be separately declared.
> On the other hand, records *force* you to accept a full set of
> record.field forms even if you want some of them blocked, and it
> is not at all clear how import/export of #record.field forms would
> be controlled.
>
> Records permit patterns like
>
> #log{status == S, users = N} = L
>
> which is arguably prettier than
>
> #log_status(S) = #log_users(N) = L
>
> but again, the latter never contains manifestly absurd matches.
>
> Dynamic Patterns.
>
> The parallel between abstract patterns and functions can be stretched a
> little further. In effect, we can regard a value of type
> (T1,...,Tn) => T0
> [where => is the pattern arrow in distinction to the function arrow ->] as
> a record
> ( b : (T1,...,Tn) -> T0 % building
> u : (T0,T1,...,Tn) -> T0 % updating
> d : T0 -> (T1,...,Tn) % decomposing
> )
> and
> X = #p(E1,...,En) ==> X = p.b(E1,...,En)
> X = E0#p(E1,...,En) ==> X = p.u(E0,E1,...,En)
> #p(P1,...,Pn) = E ==> {P1,...,Pn} = p.d(E)
> Since a triple of functions can be passed as a value, why not allow a
> pattern to be passed as a value? Let's put this in Erlang terms.
>
> Define
> #E0(E1,...,En)
> where E0 is not an atom or a module:atom pair to be
> #apply(E0, [E1,...,En])
> Define
> #apply(Mod,E0,EL)
> to be
> #apply({Mod,E0}, EL)
> Now we have to define the use of #apply/2.
>
> X = #apply(MF,L)
> where MF = {M,F}, L = [T1,...,Tn}, and M, F are atoms
> reduces to X = #M:F(T1,...,Tn), i.e. X = M:'#F'(T1,...,Tn).
> where MF is the result of (fun #F/N) executed statically within
> module M, L = [T1,...,Tn], and N = n,
> reduces to X = #M:F(T1,...,Tn)
>
> X = E0#apply(MF,L)
> where MF = {M,F}, L = {T1,...,Tn} and M, F are atoms
> reduces to X = E0#M:F(T1,...,Tn) i.e. X = M:'!F'(E0,T1,...,Tn)
> where MF is the result of (fun #F/N) executed statically within
> module M, L = [T1,...,Tn], and N = n,
> reduces to X = Eo#M:F(T1,...,Tn).
>
> #apply(MF, [P1,...,Pn]) = E
> where MF is statically a GuardExpression whose value is dynamically
> {M,F} for M and F atoms, and [P1,...,Pn] is statically a list
> of Patterns,
> reduces to #M:F(P1,...,Pn) = E, i.e. {P1,..,Pn} = M:'n=F'(E).
> where MF is statically a GuardExpression whose value is dynamically
> the result of (fun #F/N) executed statically within a module M,
> [P1,...,Pn] is statically a list of Patterns, and N = n,
> reduces to #M:F(P1,...,Pn) = E.
>
> Now we can do things like this:
>
> proj(P, []) -> [];
> proj(P, [#P(X)|Xs[) -> [X|proj(P,Xs)].
>
> #unit_list(X) -> [X].
> #unit_tuple(X) -> {X}.
>
> ... proj(fun #unit_list/1, L), ...
> ... proj(fun #unit_tuple/1, LT), ...
>
> By bending explicit fun syntax a little bit, we can even get anonymous
> patterns:
> fun #( Pattern... ) when Guard -> Pattern
> ; #( Pattern... ) when Guard -> Pattern ...
> end
> e.g.
> fun #(X) -> [X|_] end
> is an anonymous `hd' pattern. (The syntax here is based on two rules:
> to get pattern syntax from function syntax, put # in front of the
> [Module:]Name, and to get explicit fun syntax, put `fun' `end' around
> the clauses and delete the function Name.) It's not that I think
> anonymous patterns are wildly useful; the point is that if they weren't
> at least thinkable the parallels between patterns and functions would be
> sploit, to the detriment of our ability to reason correctly about them.
>
> Try doing that with records!
>
> As a matter of fact, dynamic patterns do something for us that isn't
> easy to get any other way. Look at prof/2 again. Is there any
> nonsyntactic difference between
>
> proj(P, []) -> [];
> proj(P, [#P(X)|Xs]) -> [X|proj(P, Xs)].
>
> and
>
> map(F, []) -> [];
> map(F, [X|Xs]) -> [F(X)|map(F, Xs)].
>
> or is it just a piece of `cuteness' that lets you put the function call
> on the `wrong' side?
>
> Yes, there is a difference. The function F in map/2 may do anything at
> all; it may have side effects, it may allocate arbitrarily large amounts
> of storage, it may run for huge amounts of time. The pattern P in
> proj/2 must be an abstraction of a pattern-and-guard: it may not send
> or receive messages, have any other side effects, allocate huge amounts
> of storage, or take unbounded amounts of time. It is forced to be well
> behaved in ways that an arbitary function parameter is not. Indeed, we
> are sure that the value X it returns in proj/2 is part of the term it
> matches, so it really is a projection function. If we want to reason
> about the effects of a code fragment, proj/2 is significantly easier to
> reason about than map/2. This means that anonymous patterns do have a
> use:
>
> proj(fun #(X) -> {X,_,_} end, L)
>
> is trivially well-behaved, whereas
>
> map(fun ({X,_,_}) -> X end, L)
>
> requires more analysis to see its harmlessness. (Of course, checking
> higher order functions is never entirely trivial, and it is something of
> a pity that they were added to Erlang so hastily. Topic for another
> working paper.)
>
> When you've invented a language construct, a good sign that you are onto
> something worthwhile is that applications come faster than you have time
> to write them down. When I wrote the stuff about dynamic abstract
> patterns, I didn't have an application in mind. But I suddently realised
> that there was a proposal I thought of last year which had been
> languishing
> because there was no safe way to do it.
>
> Erlang processes are vulnerable to a denial-of-service attack where a
> malicious or erroneous process repeatedly sends "junk mail". The Erlang
> documentation recommends that most or all 'receive' commands should
> include a wildcard case to discard junk mail, but that has the
> disadvantage
> that it won't discard junk mail until the process actually executes a
> receive. Junk mail can pile up while a process is suspended. If nothing
> else, it causes the recipient of the junk mail to pay (in processing and
> garbage collection time) to inspect it.
>
> Last year I came up with the idea of a 'message filter'.
>
> set_process_filter(fun (Msg) -> ... true | false end)
>
> would mean that whenever a (local) process send a message to this one,
> the message filter function would run IN THE SENDER'S THREAD to see if
> the message was ok. If it returned 'false', the message would be
> discarded, otherwise it would be retained.
>
> The problem with that is that it makes the sender vulnerable to Trojan
> horses; the message filter function could do absolutely anything while
> masquerading as the sender. What I needed was a way of declaring and
> enforcing that the message filter function would execute in bounded time
> with no side effects.
>
> Does that sound familiar? It's basically an abstract pattern.
>
> So here's the new proposal.
>
> set_process_filter(fun #() when Guard -> Pattern end)
>
> If the argument is a pattern expression of arity 0, then
> when a message is sent to this process, the pattern will
> be executed in the sender's thread and if it matches the
> message will be sent and if it fails the message will be
> discarded.
>
> set_process_filter(fun #(Result) when Guard -> Pattern end)
>
> If the argument is a pattern expression of arity 1, then
> when a message is sent to this process, the pattern will
> be executed in the sender's thread, and if it matches,
> the Result will be sent, and if it fails, the message
> will be discarded.
>
> Of course these forms are only illustrative; there could be any number
> of clauses and fun #F/0 or fun #F/1 could be used instead. The use of
> set_process_filter/1 would be to allow simple protocol conversions.
>
> Whether this feature is as useful as I think is debatable; Tobbe
> argues that the best idea is to conceal processes inside their creating
> modules, and I think he's probably right about that. The point is that
> dynamic functions whose cost is certainly bounded are a useful thing to
> have, and they fall out quite naturally from trying to take abstract
> patterns seriously.
>
>
More information about the erlang-questions
mailing list