[erlang-patches] Return end locations in erl_scan

Wed Mar 20 15:26:52 CET 2013

I forgot an important point: epp sitting between erl_scan and
erl_parse complicate things, as the text value of a token may
not correspond to what is actually written in the source file.

Regards,

-- 
Anthony Ramine

Le 20 mars 2013 à 15:22, Anthony Ramine a écrit :

> It doesn't, as my patch doesn't rely on the 'text' option of
> erl_scan being set.
> 
> Currently, when the compiler parses a file, it does not keep
> in memory the text values of every token. I can understand
> why you thought it's redundant if you were thinking the
> start + the length would suffice. But now you are suggesting
> I keep track of every text value to avoid having both a start
> and an end.
> 
> The lexer does not need to do extra work to keep track of end
> locations, otherwise how would it know the start location of
> the next token? It already has in memory, at some point, the
> end location of each token it scans. All this patch does is
> make it available when it returns.
> 
> Furthermore, let's put back this patch in the context of
> diagnostics and thus parse ranges:
> 
> You are right when you say that {'+',[{line,1},{column,1},{'end',{1,2}}]}
> and {'+',[{line,1},{column,1},{length,1}]} are equivalent.
> 
> Just the same, {integer,[{line,1},{column,1},{'end',{1,2}}],1} and
> {integer,[{line,1},{column,1},{length,1}],1} are also equivalent.
> 
> But how can I compute the locations range of "1 + 2"? You are
> suggesting that I try to compute the length of this thing, by
> substracting the start locations of "2" and "1" and then adding
> the length of "2". That would give me the whole length of the
> {op,...,'+',...,...} node. But what about nodes that span more
> than one line? To cover these cases, you are suggesting I keep
> track of the text of the tokens. Should I compute the text of
> the whole node? What about memory usage? That would be a pain
> in the ass for huge files, like erl_parse.erl.
> 
> By keeping track of the end location, I introduce nearly no
> additional overhead in the scanner, and I can keep things simple
> and constant in memory usage while computing the location ranges
> of AST nodes in the parser.
> 
> I don't see how this information can be computed easily (and
> correctly) by either keeping the length or the text values, but
> feel free to prove me wrong on this.
> 
> Also, feel free to tell me if I'm not clear enough, I'm enjoying
> this conversation a lot and would love to receive constructive
> feedback again.
> 
> Regards,
> 
> -- 
> Anthony Ramine
> 
> Le 20 mars 2013 à 14:06, Vlad Dumitrescu a écrit :
> 
>> Hi,
>> 
>> The multiline elements could be handled with the help of an utility that given a string and a (starting) position can compute the end position. I would hope that the implementation already has it in one form or another. 
>> 
>> regards,
>> Vlad
>> 
>> 
>> 
>> On Wed, Mar 20, 2013 at 2:00 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:
>> Hi Vlad,
>> 
>> Thanks for the quick reply.
>> 
>> The length of the token is defined as its length in characters. That is all
>> fine for most tokens that are on a single line, but things go to hell when
>> you take into account multiline strings, atoms and chars.
>> 
>> --
>> Anthony Ramine
>> 
>> Le 20 mars 2013 à 13:49, Vlad Dumitrescu a écrit :
>> 
>>> Hi Anthony,
>>> 
>>> Don't the tokens have a start position and a length? Why do you need an explicit end position?
>>> 
>>> regards,
>>> Vlad
>>> 
>>> 
>>> 
>>> On Wed, Mar 20, 2013 at 1:44 PM, Anthony Ramine <n.oxyde@REDACTED> wrote:
>>> Hi,
>>> 
>>> Replying on list because I think it's important.
>>> 
>>> As I said to someone I don't remember the name, this patch is only a
>>> necessary step to what my final goal is: Clang-like diagnostics for
>>> Erlang compilation [1]. Is that something the OTP team wouldn't like
>>> to see?
>>> 
>>> How is the end location in tokens redundant? I need the end locations
>>> of each tokens to be able to compute the location ranges of each node
>>> in the AST, see my work-in-progress commit for more informations [2].
>>> 
>>> That being said, I am interested in having your feedback about the
>>> implementation.
>>> 
>>> Regards,
>>> 
>>> [1] http://clang.llvm.org/diagnostics.html
>>> [2] https://github.com/nox/otp/commit/2c8038c#diff-1
>>> 
>>> PS: Sorry Hans for replying twice, I failed the Cc header.
>>> 
>>> --
>>> Anthony Ramine
>>> 
>>> Le 20 mars 2013 à 13:23, Hans Bolinder a écrit :
>>> 
>>>> Hi Anthony,
>>>> 
>>>> Sorry for not replying sooner.
>>>> 
>>>> We'll most likely reject you patch. I asked Vlad Dumitrescu about it,
>>>> and he agrees with me that the functionality (the end location of
>>>> tokens) is redundant.
>>>> 
>>>> Apart from that: when it comes to the implementation there are a few
>>>> things I don't approve of, but I need to take a closer look before
>>>> saying anything more. You've put in a good effort here, and I intend
>>>> to elaborate a little more on the implementation when I find the time
>>>> to investigate in more detail.
>>>> 
>>>> Best regards,
>>>> 
>>>> Hans Bolinder, Erlang/OTP team, Ericsson
>>> 
>>> _______________________________________________
>>> erlang-patches mailing list
>>> erlang-patches@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-patches
>>> 
>> 
>> 
>