[erlang-questions] Core Erlang: Associating Function Names to Function Bodies

Tue Mar 12 20:55:51 CET 2019

The problem seems to mainly be that you would like to be able to use
cerl_trees:fold() to do the traversing for you. But in general, a fold
doesn't pass the operator any info about the context of each element in
relation to its neighbouring elements, in the traversal order.

The representation of Core Erlang syntax trees was specifically designed so
that only constructs that have a meaningful semantics on their own will be
nodes in the tree. So, you get nodes for literals, calls, tuples, funs,
letrecs, etc. (including modules). Many of these have a simple sequence of
child nodes, as for a cons cell or a tuple, and in a post-order traversal,
you know that you have just seen all the child nodes when you see the tuple
node itself. But for some other constructs, like a 'fun (X1,...Xn) -> Body'
or a 'case Expr of <Clauses> end' or 'module Name [ <Exports> ... ]
attributes [  <Attributes>... ] F1/A1 = fun ... F2/A2 = fun ... end', there
is nothing that tells a fold how to separate the subgroups of child nodes,
especially when they all can have varying length.

To fix this, a syntax tree representation can choose to use specific
grouping nodes, like {module, [{name, [...]}, {exports, [...]},
{attributes, [...]}, {definitions, [...]} ]}, so that a fold would see
"aha, I just got done with a 'definitions' group". I guess that's what you
wanted, Chris. It does simplify some things in traversals. The drawback,
and the reason we decided to avoid all such nodes, is that it forces all
code that works on those syntax trees to understand these grouping nodes,
and treat them separately from meaning-carrying nodes. In our experience,
that is a big pain. I guess what we should have done but never got around
to is write some really good tree walking functions that can tell you the
context, choose a traversal order, and such things. As is, the
cerl:subtrees() function is your best friend. The cerl_trees:fold() was
just intended for simple things like checking for occurrences, counting
tree sizes and whatnot.

See the example below for how you can label a Core Erlang tree and traverse
it to compute mappings between labels and function names.
It's written as a core transform, so you can try it on an arbitrary module
foo by compiling it with c(foo,[{core_transform,cerl_funcmap}]).

        /Richard

%% =====================================================================
%% @doc Core transform example for computing and printing a label mapping
%% @author Richard Carlsson <carlsson.richard@REDACTED>

-module(cerl_funcmap).

-export([core_transform/2]).

core_transform(T, _Opts) ->
    {T1, _MaxL} = cerl_trees:label(T),
    %%io:format("~n~s.~n", [cerl_prettypr:format(T1)]),
    {Ms, _Ls} = visit(T1),
    io:format("Functions to labels:~n~p.~n", [maps:from_list(Ms)]),
    Rev = lists:usort(
            lists:flatmap(fun ({V, Ls}) -> [{L, V} || L <- Ls] end,
                          Ms)),
    io:format("Labels to functions:~n~p.~n", [maps:from_list(Rev)]),
    T. % return original T

visit(T) ->
    case cerl:type(T) of
        letrec ->
            {visit_defs(cerl:letrec_defs(T), []),
             get_label(T)};
        module ->
            {visit_defs(cerl:module_defs(T), []),
             get_label(T)};
        _ ->
            case cerl:subtrees(T) of
                [] ->
                    {[], get_label(T)};
                Gs ->
                    {Ms, Ls} = visit_list(lists:flatten(Gs)),
                    {Ms, get_label(T) ++ Ls}
            end
    end.

visit_list(Ts) ->
    lists:foldl(fun (T, {Ms0, Ls0}) ->
                        {Ms, Ls} = visit(T),
                        {Ms ++ Ms0, Ls ++ Ls0}
                end,
                {[], []},
                Ts).

visit_defs([{V, T} | Ds], Ms0) ->
    {Ms, Ls} = visit(T),
    Ls1 = ordsets:union(get_label(V), ordsets:from_list(Ls)),
    visit_defs(Ds, [{cerl:var_name(V), Ls1}] ++ Ms ++ Ms0);
visit_defs([], Ms) ->
    Ms.

get_label(T) ->
    case cerl:get_ann(T) of
[{label, L} | _] -> [L];
_ -> []
    end.

%% ============================================================

Den mån 11 mars 2019 kl 15:49 skrev Christopher Meiklejohn <
christopher.meiklejohn@REDACTED>:

> On Sun, Mar 10, 2019 at 1:02 PM Christopher Meiklejohn
> <christopher.meiklejohn@REDACTED> wrote:
> >
> > As part of a static analysis I'm writing, I'm trying to associate
> > function names with their function bodies so I can provide information
> > about the calling functions to users as part of the completed
> > analysis.  However, I'm running into difficulty performing this
> > mapping.
> >
> > The problem I'm running into is the following:
> >
> > 1.) In the Core Erlang AST, named functions are defined by associating
> > a c_var node in the form of {Function, Arity} to a subtree containing
> > the c_fun function body.
> > 2.) Also, in the Core Erlang AST, calls to functions are also
> > represented as c_var nodes in the form of {Function, Arity}.
> >
> > The cerl_trees:fold, mapfold all perform postorder traversals.  With a
> > postorder traversal, there may be an arbitrary distance in the fold
> > between when a function is defined and where it is given a name.  With
> > reverse postorder, my understanding is that these nodes would be
> > directly next to each other, which would make the mapping easier.
> >
> > My initial approach was to attempt to, when a c_var node was found in
> > the proper form, store a candidate name for the next function that's
> > found and perform the association that way.  The problem that arises
> > from this approach is that there can be an arbitrary number of c_var
> > nodes for function invocations between the definition of the function
> > and it's actual body.  The same can also happen for any functions that
> > occur in between as well.
> >
> > Therefore, my analysis incorrectly associates function bodies to
> > function names for any function that calls another function in the
> > body -- incorrectly thinking that the nearest c_var node containing a
> > function name is it's associated name.
> >
> > I assume this must have been done for a previous analysis performed on
> > Erlang, but I haven't found anything.  Would a reverse postorder
> > traversal make this easier?  Is there a better way to identify the
> > mapping between c_var nodes and c_funs that are being assigned to
> > them?  Has this been done before?
>
> Just for those who are interested, I did figure out one solution to
> the problem which I have tried out on several models now that appears
> to be working so far.
>
> Here's the rough outline of my currently-working solution:
>
> Since there is no node directly connecting the c_var of a variable
> assignment along with the body of the function, I had to perform a
> traversal where I try to match function bodies that I find along with
> the variables they are being assigned to.  To do this, I did the
> following:
>
> 1.) I first use cerl_trees:fold to perform a postorder traversal of
> the tree, building up a list of nodes to explore.
> 2.) I then lists:foldr over this, allowing me to perform effectively a
> reverse postorder traversal of the graph, where, I will encounter
> function bodies (c_fun) prior to encountering function variables.
> (c_var)
> 3.) I perform an analysis that allows me to match up names to bodies
> using the following heuristic:
>
> a.) Whenever I encounter a function body, I can determine from the
> node whether or not the function body is the body of an anonymous
> function or not.  Anonymous functions are annotated with a randomly
> assigned label which includes the enclosing function and the line
> where the anonymous function was generated.  This allows me to rule
> out the top-level functions from functions that are created from
> within a top-level function.  When I discover one of these functions,
> I keep track of this in a candidate list of potential function body
> matches for the function variable I am assigning to.
> b.) Whenever I encounter a c_var in the form of {Function, Arity}, I
> know that I'm dealing with an application or an assignment.
> (Curiously, why is there not a node that directly associates the c_var
> to the c_fun -- this would have made things *significantly* easier,
> but does not exist in the Core Erlang when performing a traversal via
> cerl_trees.)  If the function contains an annotation containing a line
> number of invocation, then I know it's the *application* of this
> function, and not the *abstraction* of this function.  (Again, why
> does c_var treat LHS assignments the same as RHS use of the function
> -- if this was more specific, it would have again, made things
> *significantly* easier.)  If I'm looking for a candidate match between
> body and function variable, then I perform a match based on this.
> c.) Otherwise, either the c_var is the use of a function or the c_fun
> is the creation of an anonymous function, and it's subsequently
> ignored by the matching procedure.
>
> On the 'analysis' branch of this, I have a prototype of this
> implementation that I've tested on several modules that appears to be
> working.
>
> If this could have been done a different, easier way, I'd appreciate
> knowing about it.
>
> Thanks,
> Christopher
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190312/c843dfaa/attachment.htm>