A -discontiguous
directive is to be added so that specified
functions may be presented as more than one group of clauses,
possibly separated by other directives and function clause
groups.
A new directive
-discontiguous( Name_And_Arity_List ).
is added. Each function named in such a list must have at least one clause group in that module, and may have more. It remains an error for any function not named in such a list to have more than one clause group.
A function named in a -discontiguous
directive need not
have more than one clause group. If it does, it is if
the clause groups were moved together without reordering
and the full stop of each group but the last changed to
a semicolon. The compiler should make no comment about
the existence of multiple clause groups or their fusion
into single clause groups.
The parser stage would do the regrouping and would not
include any representation of the -discontiguous
directive
in its output, so that downstream tools would never know
that -discontiguous
had been there.
There are three problems which a single mechanism can solve.
The first is that Erlang has conditional compilation, but there is no really satisfactory to use it to select some but not all of the clauses of a function.
The -discontiguous
directive allows you to write
-discontiguous([f/3]).
f(a, X, Y) -> .... .
-if(Cond).
f(b, X, Y) -> .... .
-endif.
f(c, X, T) -> .... .
The second may be called “topic-oriented programming”. It relates to human structuring of code around the data values computed on rather than the code they compute. I have found this in dealing with a virtual machine: I’ve wanted to place the code that assembles an instruction, the code that peephole optimises it, the code that encodes it into memory, and the code that interprets it into one place (involving different function), rather than organising it by function, thus scattering related information the length and breadth of the module.
It may be clearest to start with an example. The code
in erl_syntax.erl
reads:
-type syntaxTree() :: #tree{} | #wrapper{} | tuple().
%% All `erl_parse' tree nodes are represented by tuples
%% whose second field is the position information (usually
%% an integer), *with the exceptions of*
%% `{error, ...}' (type `error_marker') and
%% `{warning, ...}' (type `warning_marker'),
%% which only contain the associated line number *of the
%% error descriptor*; this is all handled transparently
%% by `get_pos' and `set_pos'.
get_pos(#tree{attr = Attr}) ->
Attr#attr.pos;
get_pos(#wrapper{attr = Attr}) ->
Attr#attr.pos;
get_pos({error, {Pos, _, _}}) ->
Pos;
get_pos({warning, {Pos, _, _}}) ->
Pos;
get_pos(Node) ->
%% Here, we assume that we have an `erl_parse' node
%% with position information in element 2.
element(2, Node).
set_pos(Node, Pos) ->
case Node of
#tree{attr = Attr} ->
Node#tree{attr = Attr#attr{pos = Pos}};
#wrapper{attr = Attr} ->
Node#wrapper{attr = Attr#attr{pos = Pos}};
_ ->
%% We then assume we have an `erl_parse' node,
%% and create a wrapper around it to make
%% things more uniform.
set_pos(wrap(Node), Pos)
end.
The type here is a little vague. The additional tuples appear
to be {error,{Pos,_,_}}, {warning,{Pos,_,_}}
, and the
{Tag,Pos...}
tuples returned by erl_parse
. The thing here is
that there are five different cases. For some purposes,
it would be better to write
-discontiguous([get_pos/1,set_pos/2]).
get_pos(#tree{attr = Attr}) -> Attr#attr.pos.
set_pos(#tree{attr = Attr} = Node, Pos) ->
Node#tree{attr = Attr#attr{pos = Pos}}.
get_pos(#wrapper{attr = Attr}) -> Attr#attr.pos.
set_pos(#wrapper{attr = Attr} = Node, Pos) ->
Node#wrapper{attr = Attr#attr{pos = Pos}}.
get_pos({error, {Pos,_,_}}) -> Pos.
% What should set_pos/2 do in this case?
get_pos({warning, {Pos,_,_}}) -> Pos.
% What should set_pos/2 do in this case?
get_pos(Node) -> element(2, Node). % assume erl_parse node
set_pos(Node, Pos) -> % assume erl_parse node
set_pos(wrap(Node), Pos). % wrap it for uniformity
This brings out the parallel between the two functions, and the way the parallel fails, more clearly than any other possible layout. It nags at you to either finish the parallel with the obvious
set_pos({error, {_,X,Y}}, Pos) ->
{error, {Pos,X,Y}}.
and
set_pos({warning, {_,X,Y}), Pos) ->
{warning, {Pos,X,Y}}.
clauses or to at least change the comments to
% set_pos/2 falls through to the last case.
comments.
We have the same pattern, without the failure of parallelism, in two more functions from that file:
get_com(#tree{attr = Attr}) -> Attr#attr.com;
get_com(#wrapper{attr = Attr}) -> Attr#attr.com;
get_com(_) -> none.
set_com(Node, Com) ->
case Node of
#tree{attr = Attr} ->
Node#tree{attr = Attr#attr{com = Com}};
#wrapper{attr = Attr} ->
Node#wrapper{attr = Attr#attr{com = Com}};
_ ->
set_com(wrap(Node), Com)
end.
These could be
-discontiguous([get_com/1,set_com/1]).
get_com(#tree{attr = Attr}) -> Attr#attr.com.
set_com(#tree{attr = Attr} = Node, Com) ->
Node#tree{attr = Attr#attr{com = Com}}.
get_com(#wrapper{attr = Attr}) -> Attr#attr.com.
set_com(#wrapper{attr = Attr} = Node, Com) ->
Node#wrapper{attr = Attr#attr{com = Com}}.
get_com(_) -> none. % error, warning, erl_parse.
set_com(Node, Com) ->
set_com(wrap(Node), Com).
Well, once again the parallel is not quite perfect.
The documentation for wrap/1
says that it assumes
its argument is a class erl_parse
tuple, which here
means that it appears that it should NOT be an error
or warning.
The point of interest here is that just looking at the existing functions didn’t ring any alarms; it was not until I said “these seem to be about the same data structure; I wonder if interleaving can make the connection clearer and make it easier to ensure that getters and setters are properly related?” that my attention was properly drawn to the differences.
It’s particularly interesting that the very first Erlang/OTP source file I looked at provided examples. The third is like the second, but relates to code written by a computer, not a human. For example, if generating a functional representation of some sort of state machine, it can be convenient to organise the output around the states, but the present scheme requires it to be organised around the functions that deal with the states.
Prolog systems have supported a :- discontiguous
declaration
for 20+ years. The approach is a proven one. It is a simple
generalisation of the language which can be hidden from all
“downstream” tools. Only tools that try to deal with Erlang
syntax without fully parsing it could notice the difference,
and they should largely ignore it.
All existing Erlang code remains acceptable with unchanged
semantics. Existing language-processing tools are unaffected
if they rely on erl_parse
.
None in this draft.
None.
This document has been placed in the public domain.