[erlang-questions] Reading comments/annotations inside parse_transforms

Tue Jul 10 21:20:13 CEST 2012

On 07/05/2012 11:10 AM, Tim Watson wrote:
> There doesn't appear to be any way of doing this currently. There is
> support in erl_syntax for working with these and there is also an
> erl_comment_scan module in the syntax-tools application which parses
> and returns comments from a file (or whatever). What there doesn't
> appear to be, is a means to get erl_parse to return comments as
> attributes (?) in the AST that the compiler passes to a
> parse_transform.
>
> My question then, is this: what is the best (only!?) way to process
> comments at the same time as source code in a parse_transform? So far,
> given that this seems unlikely to work OOTB, I'm assuming I'll have to
> do something like:
>
> 1. once you hit the 'file' attribute, parse the comments and stash
> these away in a useful place
> 2. process the forms as usual, comparing each line number for each
> form, against the line numbers stored for the comments
> 3. when you get an interleaving of code and comments, you know where
> the comments reside in the source file in relation to the form you're
> currently working with
>
> I *can* do this of course, but it seems like an awful lot of hard
> work. Why can't comments be preserved in the AST passed to a parse
> transform, and then just dropped later on!? Is there an easier way of
> doing this that I'm currently missing?

The compiler toolchain just isn't targeted at preserving comments; they 
are treated as whitespace and are discarded already at the tokenization 
stage (erl_scan), even before the preprocessing stage (epp). After that, 
there's the parsing stage, and then the parse transforms are called. So, 
as you said, you'll need to use the -file("...") hints to locate the 
source files and read them again to extract the comments.

It would certainly be possible to make a compiler that preserves 
comments for later passes, but the way it's currently done is the 
"traditional" way of writing a compiler, and changing it afterwards can 
be difficult, since all the code that expects the current behaviour has 
to be updated.

The good news is that the work of digging out comments and connecting 
them to the abstract syntax trees has already been done, in the 
syntax_tools library - EDoc uses exactly this (although not as a parse 
transform). You can call erl_comment_scan:file/1 to get the comment 
lines, and then use erl_recomment:recomment_forms/2 to attach the 
comments to the abstract syntax trees that the parse transform got. The 
result is a extended abstract syntax tree as defined by the erl_syntax 
module. You can use the erl_syntax functions to manipulate the tree, and 
when you're done, you need to call erl_syntax:revert/1 to revert the 
representation to the form that the compiler understands (this loses the 
comments again). For a detailed usage example, see edoc_extract:source/3 
and friends.

     /Richard