[erlang-patches] Return end locations in erl_scan

Wed Mar 20 15:22:45 CET 2013

It doesn't, as my patch doesn't rely on the 'text' option of
erl_scan being set.

Currently, when the compiler parses a file, it does not keep
in memory the text values of every token. I can understand
why you thought it's redundant if you were thinking the
start + the length would suffice. But now you are suggesting
I keep track of every text value to avoid having both a start
and an end.

The lexer does not need to do extra work to keep track of end
locations, otherwise how would it know the start location of
the next token? It already has in memory, at some point, the
end location of each token it scans. All this patch does is
make it available when it returns.

Furthermore, let's put back this patch in the context of
diagnostics and thus parse ranges:

You are right when you say that {'+',[{line,1},{column,1},{'end',{1,2}}]}
and {'+',[{line,1},{column,1},{length,1}]} are equivalent.

Just the same, {integer,[{line,1},{column,1},{'end',{1,2}}],1} and
{integer,[{line,1},{column,1},{length,1}],1} are also equivalent.

But how can I compute the locations range of "1 + 2"? You are
suggesting that I try to compute the length of this thing, by
substracting the start locations of "2" and "1" and then adding
the length of "2". That would give me the whole length of the
{op,...,'+',...,...} node. But what about nodes that span more
than one line? To cover these cases, you are suggesting I keep
track of the text of the tokens. Should I compute the text of
the whole node? What about memory usage? That would be a pain
in the ass for huge files, like erl_parse.erl.

By keeping track of the end location, I introduce nearly no
additional overhead in the scanner, and I can keep things simple
and constant in memory usage while computing the location ranges
of AST nodes in the parser.

I don't see how this information can be computed easily (and
correctly) by either keeping the length or the text values, but
feel free to prove me wrong on this.

Also, feel free to tell me if I'm not clear enough, I'm enjoying
this conversation a lot and would love to receive constructive
feedback again.

Regards,

-- 
Anthony Ramine

Le 20 mars 2013 à 14:06, Vlad Dumitrescu a écrit :

> Hi,
> 
> The multiline elements could be handled with the help of an utility that given a string and a (starting) position can compute the end position. I would hope that the implementation already has it in one form or another. 
> 
> regards,
> Vlad
> 
> 
> 
> On Wed, Mar 20, 2013 at 2:00 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:
> Hi Vlad,
> 
> Thanks for the quick reply.
> 
> The length of the token is defined as its length in characters. That is all
> fine for most tokens that are on a single line, but things go to hell when
> you take into account multiline strings, atoms and chars.
> 
> --
> Anthony Ramine
> 
> Le 20 mars 2013 à 13:49, Vlad Dumitrescu a écrit :
> 
> > Hi Anthony,
> >
> > Don't the tokens have a start position and a length? Why do you need an explicit end position?
> >
> > regards,
> > Vlad
> >
> >
> >
> > On Wed, Mar 20, 2013 at 1:44 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:
> > Hi,
> >
> > Replying on list because I think it's important.
> >
> > As I said to someone I don't remember the name, this patch is only a
> > necessary step to what my final goal is: Clang-like diagnostics for
> > Erlang compilation [1]. Is that something the OTP team wouldn't like
> > to see?
> >
> > How is the end location in tokens redundant? I need the end locations
> > of each tokens to be able to compute the location ranges of each node
> > in the AST, see my work-in-progress commit for more informations [2].
> >
> > That being said, I am interested in having your feedback about the
> > implementation.
> >
> > Regards,
> >
> > [1] http://clang.llvm.org/diagnostics.html
> > [2] https://github.com/nox/otp/commit/2c8038c#diff-1
> >
> > PS: Sorry Hans for replying twice, I failed the Cc header.
> >
> > --
> > Anthony Ramine
> >
> > Le 20 mars 2013 à 13:23, Hans Bolinder a écrit :
> >
> > > Hi Anthony,
> > >
> > > Sorry for not replying sooner.
> > >
> > > We'll most likely reject you patch. I asked Vlad Dumitrescu about it,
> > > and he agrees with me that the functionality (the end location of
> > > tokens) is redundant.
> > >
> > > Apart from that: when it comes to the implementation there are a few
> > > things I don't approve of, but I need to take a closer look before
> > > saying anything more. You've put in a good effort here, and I intend
> > > to elaborate a little more on the implementation when I find the time
> > > to investigate in more detail.
> > >
> > > Best regards,
> > >
> > > Hans Bolinder, Erlang/OTP team, Ericsson
> >
> > _______________________________________________
> > erlang-patches mailing list
> > erlang-patches@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-patches
> >
> 
>