<div dir="ltr"><div class="gmail_extra">Hi Joe,</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 26, 2016 at 10:03 PM, Joe Armstrong <span dir="ltr"><<a href="mailto:erlang@gmail.com" target="_blank">erlang@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>4) The Erlang parser should be changed to exactly<br>
reproduce the source.<br>
Right now the parse tree of correct erlang has all the comments<br>
and white space removed. I'd suggest attaching the comments to the<br>
next following token (for example {atom,Line,theAtom} should become<br>
{atom, Line, theAtom, "the preceding comments and white space"}<br>
It should be possible to *exactly* reconstruct the input from the parse<br>
tree.<br>
<br>
<aside> - in the first erlang all the different ways of writing an integer<br>
ended up as the same token. So writing 16#fc was the same as writing the<br>
integer 252 and tokenized as {integer,Line,252} - the tokenizer threw<br>
away the exact input so it was impossible to reconstruct the source<br>
from the token stream. Now it's better the 16#fc is tokenized as<br>
{integer,[{location,{Line,Col}<wbr>},{text,"16$fc}], 252} - but comments<br>
and white space are not<br>
retained in the parse tree.<br>
<br>
Note: that change the parse tree is *not* a simple hack - all tools that<br>
depend upon the parse tree have to be changed.<br>
</aside><br></div></blockquote></div><br>This is what I set out to implement with 'sourcer' (<a href="https://github.com/erlang/sourcer">https://github.com/erlang/sourcer</a>). I am actually aiming even higher: keep track of macros and -ifdefs and the possibly different structure of a module when considering them. Also, I want to be able to parse code that is in the process of being edited (only possible if knowing the latest parseable state of the file). On the bright side, this parser need not replace erl_parse. The compiler and most tools don't need all this extra detail. It would be good to be able to not implement _everything_ from scratch, but I couldn't find a way to do all this than with a hand-written parser. Ideas, feedback, suggestions, improvements and rotten tomatoes are welcome. </div><div class="gmail_extra"><br></div><div class="gmail_extra">regards,<br></div><div class="gmail_extra">Vlad</div><div class="gmail_extra"><br></div></div>