[erlang-questions] Core Erlang: Associating Function Names to Function Bodies

Mon Mar 11 15:49:27 CET 2019

On Sun, Mar 10, 2019 at 1:02 PM Christopher Meiklejohn
<christopher.meiklejohn@REDACTED> wrote:
>
> As part of a static analysis I'm writing, I'm trying to associate
> function names with their function bodies so I can provide information
> about the calling functions to users as part of the completed
> analysis.  However, I'm running into difficulty performing this
> mapping.
>
> The problem I'm running into is the following:
>
> 1.) In the Core Erlang AST, named functions are defined by associating
> a c_var node in the form of {Function, Arity} to a subtree containing
> the c_fun function body.
> 2.) Also, in the Core Erlang AST, calls to functions are also
> represented as c_var nodes in the form of {Function, Arity}.
>
> The cerl_trees:fold, mapfold all perform postorder traversals.  With a
> postorder traversal, there may be an arbitrary distance in the fold
> between when a function is defined and where it is given a name.  With
> reverse postorder, my understanding is that these nodes would be
> directly next to each other, which would make the mapping easier.
>
> My initial approach was to attempt to, when a c_var node was found in
> the proper form, store a candidate name for the next function that's
> found and perform the association that way.  The problem that arises
> from this approach is that there can be an arbitrary number of c_var
> nodes for function invocations between the definition of the function
> and it's actual body.  The same can also happen for any functions that
> occur in between as well.
>
> Therefore, my analysis incorrectly associates function bodies to
> function names for any function that calls another function in the
> body -- incorrectly thinking that the nearest c_var node containing a
> function name is it's associated name.
>
> I assume this must have been done for a previous analysis performed on
> Erlang, but I haven't found anything.  Would a reverse postorder
> traversal make this easier?  Is there a better way to identify the
> mapping between c_var nodes and c_funs that are being assigned to
> them?  Has this been done before?

Just for those who are interested, I did figure out one solution to
the problem which I have tried out on several models now that appears
to be working so far.

Here's the rough outline of my currently-working solution:

Since there is no node directly connecting the c_var of a variable
assignment along with the body of the function, I had to perform a
traversal where I try to match function bodies that I find along with
the variables they are being assigned to.  To do this, I did the
following:

1.) I first use cerl_trees:fold to perform a postorder traversal of
the tree, building up a list of nodes to explore.
2.) I then lists:foldr over this, allowing me to perform effectively a
reverse postorder traversal of the graph, where, I will encounter
function bodies (c_fun) prior to encountering function variables.
(c_var)
3.) I perform an analysis that allows me to match up names to bodies
using the following heuristic:

a.) Whenever I encounter a function body, I can determine from the
node whether or not the function body is the body of an anonymous
function or not.  Anonymous functions are annotated with a randomly
assigned label which includes the enclosing function and the line
where the anonymous function was generated.  This allows me to rule
out the top-level functions from functions that are created from
within a top-level function.  When I discover one of these functions,
I keep track of this in a candidate list of potential function body
matches for the function variable I am assigning to.
b.) Whenever I encounter a c_var in the form of {Function, Arity}, I
know that I'm dealing with an application or an assignment.
(Curiously, why is there not a node that directly associates the c_var
to the c_fun -- this would have made things *significantly* easier,
but does not exist in the Core Erlang when performing a traversal via
cerl_trees.)  If the function contains an annotation containing a line
number of invocation, then I know it's the *application* of this
function, and not the *abstraction* of this function.  (Again, why
does c_var treat LHS assignments the same as RHS use of the function
-- if this was more specific, it would have again, made things
*significantly* easier.)  If I'm looking for a candidate match between
body and function variable, then I perform a match based on this.
c.) Otherwise, either the c_var is the use of a function or the c_fun
is the creation of an anonymous function, and it's subsequently
ignored by the matching procedure.

On the 'analysis' branch of this, I have a prototype of this
implementation that I've tested on several modules that appears to be
working.

If this could have been done a different, easier way, I'd appreciate
knowing about it.

Thanks,
Christopher