[erlang-patches] Return end locations in erl_scan

Wed Mar 20 16:08:41 CET 2013

Hi,

I see your points. I am a little biased by the fact that we are using our
own extended scanner in erlide, where we have all this extra information.
For a tool that handles source, this is necessary because there are several
places where the token's value and its representation aren't the same (for
example, based integers).

If you use the return_white_space option, then the end location of a token
is the one just before the start of the next one. This should be easier to
handle than using the token length.

On the other hand, you are right that the scanner already has this
information available and it's just a matter of making it available, so I
wouldn't mind to don't need to do the calculations myself. And even if I
have to keep track of the token text anyway, I agree that not everybody
should be forced to.

I haven't checked your patch in detail, maybe I should :-)

regards,
Vlad

On Wed, Mar 20, 2013 at 3:22 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:

> It doesn't, as my patch doesn't rely on the 'text' option of
> erl_scan being set.
>
> Currently, when the compiler parses a file, it does not keep
> in memory the text values of every token. I can understand
> why you thought it's redundant if you were thinking the
> start + the length would suffice. But now you are suggesting
> I keep track of every text value to avoid having both a start
> and an end.
>
> The lexer does not need to do extra work to keep track of end
> locations, otherwise how would it know the start location of
> the next token? It already has in memory, at some point, the
> end location of each token it scans. All this patch does is
> make it available when it returns.
>
> Furthermore, let's put back this patch in the context of
> diagnostics and thus parse ranges:
>
> You are right when you say that {'+',[{line,1},{column,1},{'end',{1,2}}]}
> and {'+',[{line,1},{column,1},{length,1}]} are equivalent.
>
> Just the same, {integer,[{line,1},{column,1},{'end',{1,2}}],1} and
> {integer,[{line,1},{column,1},{length,1}],1} are also equivalent.
>
> But how can I compute the locations range of "1 + 2"? You are
> suggesting that I try to compute the length of this thing, by
> substracting the start locations of "2" and "1" and then adding
> the length of "2". That would give me the whole length of the
> {op,...,'+',...,...} node. But what about nodes that span more
> than one line? To cover these cases, you are suggesting I keep
> track of the text of the tokens. Should I compute the text of
> the whole node? What about memory usage? That would be a pain
> in the ass for huge files, like erl_parse.erl.
>
> By keeping track of the end location, I introduce nearly no
> additional overhead in the scanner, and I can keep things simple
> and constant in memory usage while computing the location ranges
> of AST nodes in the parser.
>
> I don't see how this information can be computed easily (and
> correctly) by either keeping the length or the text values, but
> feel free to prove me wrong on this.
>
> Also, feel free to tell me if I'm not clear enough, I'm enjoying
> this conversation a lot and would love to receive constructive
> feedback again.
>
> Regards,
>
> --
> Anthony Ramine
>
> Le 20 mars 2013 à 14:06, Vlad Dumitrescu a écrit :
>
> > Hi,
> >
> > The multiline elements could be handled with the help of an utility that
> given a string and a (starting) position can compute the end position. I
> would hope that the implementation already has it in one form or another.
> >
> > regards,
> > Vlad
> >
> >
> >
> > On Wed, Mar 20, 2013 at 2:00 PM, Anthony Ramine <n.oxyde@REDACTED>
> wrote:
> > Hi Vlad,
> >
> > Thanks for the quick reply.
> >
> > The length of the token is defined as its length in characters. That is
> all
> > fine for most tokens that are on a single line, but things go to hell
> when
> > you take into account multiline strings, atoms and chars.
> >
> > --
> > Anthony Ramine
> >
> > Le 20 mars 2013 à 13:49, Vlad Dumitrescu a écrit :
> >
> > > Hi Anthony,
> > >
> > > Don't the tokens have a start position and a length? Why do you need
> an explicit end position?
> > >
> > > regards,
> > > Vlad
> > >
> > >
> > >
> > > On Wed, Mar 20, 2013 at 1:44 PM, Anthony Ramine <n.oxyde@REDACTED>
> wrote:
> > > Hi,
> > >
> > > Replying on list because I think it's important.
> > >
> > > As I said to someone I don't remember the name, this patch is only a
> > > necessary step to what my final goal is: Clang-like diagnostics for
> > > Erlang compilation [1]. Is that something the OTP team wouldn't like
> > > to see?
> > >
> > > How is the end location in tokens redundant? I need the end locations
> > > of each tokens to be able to compute the location ranges of each node
> > > in the AST, see my work-in-progress commit for more informations [2].
> > >
> > > That being said, I am interested in having your feedback about the
> > > implementation.
> > >
> > > Regards,
> > >
> > > [1] http://clang.llvm.org/diagnostics.html
> > > [2] https://github.com/nox/otp/commit/2c8038c#diff-1
> > >
> > > PS: Sorry Hans for replying twice, I failed the Cc header.
> > >
> > > --
> > > Anthony Ramine
> > >
> > > Le 20 mars 2013 à 13:23, Hans Bolinder a écrit :
> > >
> > > > Hi Anthony,
> > > >
> > > > Sorry for not replying sooner.
> > > >
> > > > We'll most likely reject you patch. I asked Vlad Dumitrescu about it,
> > > > and he agrees with me that the functionality (the end location of
> > > > tokens) is redundant.
> > > >
> > > > Apart from that: when it comes to the implementation there are a few
> > > > things I don't approve of, but I need to take a closer look before
> > > > saying anything more. You've put in a good effort here, and I intend
> > > > to elaborate a little more on the implementation when I find the time
> > > > to investigate in more detail.
> > > >
> > > > Best regards,
> > > >
> > > > Hans Bolinder, Erlang/OTP team, Ericsson
> > >
> > > _______________________________________________
> > > erlang-patches mailing list
> > > erlang-patches@REDACTED
> > > http://erlang.org/mailman/listinfo/erlang-patches
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20130320/a51da605/attachment.htm>