<div dir="ltr">That's another can of worms... I think we should join the movement for liberation from the preprocessor :-)<div><br></div><div style>/Vlad</div><div style><br></div></div><div class="gmail_extra"><br><br>


<div class="gmail_quote">On Wed, Mar 20, 2013 at 3:26 PM, Anthony Ramine <span dir="ltr"><<a href="mailto:n.oxyde@gmail.com" target="_blank">n.oxyde@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


I forgot an important point: epp sitting between erl_scan and<br>

erl_parse complicate things, as the text value of a token may<br>

not correspond to what is actually written in the source file.<br>

<br>

Regards,<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Anthony Ramine<br>

<br>

Le 20 mars 2013 à 15:22, Anthony Ramine a écrit :<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

> It doesn't, as my patch doesn't rely on the 'text' option of<br>

> erl_scan being set.<br>

><br>

> Currently, when the compiler parses a file, it does not keep<br>

> in memory the text values of every token. I can understand<br>

> why you thought it's redundant if you were thinking the<br>

> start + the length would suffice. But now you are suggesting<br>

> I keep track of every text value to avoid having both a start<br>

> and an end.<br>

><br>

> The lexer does not need to do extra work to keep track of end<br>

> locations, otherwise how would it know the start location of<br>

> the next token? It already has in memory, at some point, the<br>

> end location of each token it scans. All this patch does is<br>

> make it available when it returns.<br>

><br>

> Furthermore, let's put back this patch in the context of<br>

> diagnostics and thus parse ranges:<br>

><br>

> You are right when you say that {'+',[{line,1},{column,1},{'end',{1,2}}]}<br>

> and {'+',[{line,1},{column,1},{length,1}]} are equivalent.<br>

><br>

> Just the same, {integer,[{line,1},{column,1},{'end',{1,2}}],1} and<br>

> {integer,[{line,1},{column,1},{length,1}],1} are also equivalent.<br>

><br>

> But how can I compute the locations range of "1 + 2"? You are<br>

> suggesting that I try to compute the length of this thing, by<br>

> substracting the start locations of "2" and "1" and then adding<br>

> the length of "2". That would give me the whole length of the<br>

> {op,...,'+',...,...} node. But what about nodes that span more<br>

> than one line? To cover these cases, you are suggesting I keep<br>

> track of the text of the tokens. Should I compute the text of<br>

> the whole node? What about memory usage? That would be a pain<br>

> in the ass for huge files, like erl_parse.erl.<br>

><br>

> By keeping track of the end location, I introduce nearly no<br>

> additional overhead in the scanner, and I can keep things simple<br>

> and constant in memory usage while computing the location ranges<br>

> of AST nodes in the parser.<br>

><br>

> I don't see how this information can be computed easily (and<br>

> correctly) by either keeping the length or the text values, but<br>

> feel free to prove me wrong on this.<br>

><br>

> Also, feel free to tell me if I'm not clear enough, I'm enjoying<br>

> this conversation a lot and would love to receive constructive<br>

> feedback again.<br>

><br>

> Regards,<br>

><br>

> --<br>

> Anthony Ramine<br>

><br>

> Le 20 mars 2013 à 14:06, Vlad Dumitrescu a écrit :<br>

><br>

>> Hi,<br>

>><br>

>> The multiline elements could be handled with the help of an utility that given a string and a (starting) position can compute the end position. I would hope that the implementation already has it in one form or another.<br>


>><br>

>> regards,<br>

>> Vlad<br>

>><br>

>><br>

>><br>

>> On Wed, Mar 20, 2013 at 2:00 PM, Anthony Ramine <<a href="mailto:n.oxyde@gmail.com">n.oxyde@gmail.com</a>> wrote:<br>

>> Hi Vlad,<br>

>><br>

>> Thanks for the quick reply.<br>

>><br>

>> The length of the token is defined as its length in characters. That is all<br>

>> fine for most tokens that are on a single line, but things go to hell when<br>

>> you take into account multiline strings, atoms and chars.<br>

>><br>

>> --<br>

>> Anthony Ramine<br>

>><br>

>> Le 20 mars 2013 à 13:49, Vlad Dumitrescu a écrit :<br>

>><br>

>>> Hi Anthony,<br>

>>><br>

>>> Don't the tokens have a start position and a length? Why do you need an explicit end position?<br>

>>><br>

>>> regards,<br>

>>> Vlad<br>

>>><br>

>>><br>

>>><br>

>>> On Wed, Mar 20, 2013 at 1:44 PM, Anthony Ramine <<a href="mailto:n.oxyde@gmail.com">n.oxyde@gmail.com</a>> wrote:<br>

>>> Hi,<br>

>>><br>

>>> Replying on list because I think it's important.<br>

>>><br>

>>> As I said to someone I don't remember the name, this patch is only a<br>

>>> necessary step to what my final goal is: Clang-like diagnostics for<br>

>>> Erlang compilation [1]. Is that something the OTP team wouldn't like<br>

>>> to see?<br>

>>><br>

>>> How is the end location in tokens redundant? I need the end locations<br>

>>> of each tokens to be able to compute the location ranges of each node<br>

>>> in the AST, see my work-in-progress commit for more informations [2].<br>

>>><br>

>>> That being said, I am interested in having your feedback about the<br>

>>> implementation.<br>

>>><br>

>>> Regards,<br>

>>><br>

>>> [1] <a href="http://clang.llvm.org/diagnostics.html" target="_blank">http://clang.llvm.org/diagnostics.html</a><br>

>>> [2] <a href="https://github.com/nox/otp/commit/2c8038c#diff-1" target="_blank">https://github.com/nox/otp/commit/2c8038c#diff-1</a><br>

>>><br>

>>> PS: Sorry Hans for replying twice, I failed the Cc header.<br>

>>><br>

>>> --<br>

>>> Anthony Ramine<br>

>>><br>

>>> Le 20 mars 2013 à 13:23, Hans Bolinder a écrit :<br>

>>><br>

>>>> Hi Anthony,<br>

>>>><br>

>>>> Sorry for not replying sooner.<br>

>>>><br>

>>>> We'll most likely reject you patch. I asked Vlad Dumitrescu about it,<br>

>>>> and he agrees with me that the functionality (the end location of<br>

>>>> tokens) is redundant.<br>

>>>><br>

>>>> Apart from that: when it comes to the implementation there are a few<br>

>>>> things I don't approve of, but I need to take a closer look before<br>

>>>> saying anything more. You've put in a good effort here, and I intend<br>

>>>> to elaborate a little more on the implementation when I find the time<br>

>>>> to investigate in more detail.<br>

>>>><br>

>>>> Best regards,<br>

>>>><br>

>>>> Hans Bolinder, Erlang/OTP team, Ericsson<br>

>>><br>

>>> _______________________________________________<br>

>>> erlang-patches mailing list<br>

>>> <a href="mailto:erlang-patches@erlang.org">erlang-patches@erlang.org</a><br>

>>> <a href="http://erlang.org/mailman/listinfo/erlang-patches" target="_blank">http://erlang.org/mailman/listinfo/erlang-patches</a><br>

>>><br>

>><br>

>><br>

><br>

<br>

</div></div></blockquote></div><br></div>