[erlang-patches] Return end locations in erl_scan

Thu Mar 21 10:33:58 CET 2013

Maybe, but still if we want diagnostics we need to properly track
macro expansions.

Anyway, I rebased against latest maint.

-- 
Anthony Ramine

Le 20 mars 2013 à 16:09, Vlad Dumitrescu a écrit :

> That's another can of worms... I think we should join the movement for liberation from the preprocessor :-)
> 
> /Vlad
> 
> 
> 
> On Wed, Mar 20, 2013 at 3:26 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:
> I forgot an important point: epp sitting between erl_scan and
> erl_parse complicate things, as the text value of a token may
> not correspond to what is actually written in the source file.
> 
> Regards,
> 
> --
> Anthony Ramine
> 
> Le 20 mars 2013 à 15:22, Anthony Ramine a écrit :
> 
> > It doesn't, as my patch doesn't rely on the 'text' option of
> > erl_scan being set.
> >
> > Currently, when the compiler parses a file, it does not keep
> > in memory the text values of every token. I can understand
> > why you thought it's redundant if you were thinking the
> > start + the length would suffice. But now you are suggesting
> > I keep track of every text value to avoid having both a start
> > and an end.
> >
> > The lexer does not need to do extra work to keep track of end
> > locations, otherwise how would it know the start location of
> > the next token? It already has in memory, at some point, the
> > end location of each token it scans. All this patch does is
> > make it available when it returns.
> >
> > Furthermore, let's put back this patch in the context of
> > diagnostics and thus parse ranges:
> >
> > You are right when you say that {'+',[{line,1},{column,1},{'end',{1,2}}]}
> > and {'+',[{line,1},{column,1},{length,1}]} are equivalent.
> >
> > Just the same, {integer,[{line,1},{column,1},{'end',{1,2}}],1} and
> > {integer,[{line,1},{column,1},{length,1}],1} are also equivalent.
> >
> > But how can I compute the locations range of "1 + 2"? You are
> > suggesting that I try to compute the length of this thing, by
> > substracting the start locations of "2" and "1" and then adding
> > the length of "2". That would give me the whole length of the
> > {op,...,'+',...,...} node. But what about nodes that span more
> > than one line? To cover these cases, you are suggesting I keep
> > track of the text of the tokens. Should I compute the text of
> > the whole node? What about memory usage? That would be a pain
> > in the ass for huge files, like erl_parse.erl.
> >
> > By keeping track of the end location, I introduce nearly no
> > additional overhead in the scanner, and I can keep things simple
> > and constant in memory usage while computing the location ranges
> > of AST nodes in the parser.
> >
> > I don't see how this information can be computed easily (and
> > correctly) by either keeping the length or the text values, but
> > feel free to prove me wrong on this.
> >
> > Also, feel free to tell me if I'm not clear enough, I'm enjoying
> > this conversation a lot and would love to receive constructive
> > feedback again.
> >
> > Regards,
> >
> > --
> > Anthony Ramine
> >
> > Le 20 mars 2013 à 14:06, Vlad Dumitrescu a écrit :
> >
> >> Hi,
> >>
> >> The multiline elements could be handled with the help of an utility that given a string and a (starting) position can compute the end position. I would hope that the implementation already has it in one form or another.
> >>
> >> regards,
> >> Vlad
> >>
> >>
> >>
> >> On Wed, Mar 20, 2013 at 2:00 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:
> >> Hi Vlad,
> >>
> >> Thanks for the quick reply.
> >>
> >> The length of the token is defined as its length in characters. That is all
> >> fine for most tokens that are on a single line, but things go to hell when
> >> you take into account multiline strings, atoms and chars.
> >>
> >> --
> >> Anthony Ramine
> >>
> >> Le 20 mars 2013 à 13:49, Vlad Dumitrescu a écrit :
> >>
> >>> Hi Anthony,
> >>>
> >>> Don't the tokens have a start position and a length? Why do you need an explicit end position?
> >>>
> >>> regards,
> >>> Vlad
> >>>
> >>>
> >>>
> >>> On Wed, Mar 20, 2013 at 1:44 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:
> >>> Hi,
> >>>
> >>> Replying on list because I think it's important.
> >>>
> >>> As I said to someone I don't remember the name, this patch is only a
> >>> necessary step to what my final goal is: Clang-like diagnostics for
> >>> Erlang compilation [1]. Is that something the OTP team wouldn't like
> >>> to see?
> >>>
> >>> How is the end location in tokens redundant? I need the end locations
> >>> of each tokens to be able to compute the location ranges of each node
> >>> in the AST, see my work-in-progress commit for more informations [2].
> >>>
> >>> That being said, I am interested in having your feedback about the
> >>> implementation.
> >>>
> >>> Regards,
> >>>
> >>> [1] http://clang.llvm.org/diagnostics.html
> >>> [2] https://github.com/nox/otp/commit/2c8038c#diff-1
> >>>
> >>> PS: Sorry Hans for replying twice, I failed the Cc header.
> >>>
> >>> --
> >>> Anthony Ramine
> >>>
> >>> Le 20 mars 2013 à 13:23, Hans Bolinder a écrit :
> >>>
> >>>> Hi Anthony,
> >>>>
> >>>> Sorry for not replying sooner.
> >>>>
> >>>> We'll most likely reject you patch. I asked Vlad Dumitrescu about it,
> >>>> and he agrees with me that the functionality (the end location of
> >>>> tokens) is redundant.
> >>>>
> >>>> Apart from that: when it comes to the implementation there are a few
> >>>> things I don't approve of, but I need to take a closer look before
> >>>> saying anything more. You've put in a good effort here, and I intend
> >>>> to elaborate a little more on the implementation when I find the time
> >>>> to investigate in more detail.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Hans Bolinder, Erlang/OTP team, Ericsson
> >>>
> >>> _______________________________________________
> >>> erlang-patches mailing list
> >>> erlang-patches@REDACTED
> >>> http://erlang.org/mailman/listinfo/erlang-patches
> >>>
> >>
> >>
> >
> 
>