[erlang-questions] incremental parsing

Mon Jun 27 08:49:26 CEST 2011

Hi Joel,

On Mon, Jun 27, 2011 at 01:51, Joel Reymont <joelr1@REDACTED> wrote:
> Assuming an editor backend written in Erlang,
> any suggestions on how to implement incremental parsing of Erlang code?

We're doing a somewhat incremental parsing in erlide. The idea is that
you keep track of the token stream created by the lexer and when there
is a change in the text, you know that only some tokens can be
affected: in the worst case all tokens following the place where the
change was, but if you detect a token that is identical to what it was
before then you don't need to continue. In a similar way, the syntax
tree needs to keep track of the tokens it is created from and when
there are changes only these parts need to get parsed again.This works
ok when there are no syntax errors (which happen all the time while
typing, until one is done). We try to be smart about that, for example
keep track of all the forms in the module and not directly let the
retokenization pass the form boundary: if the user starts typing a
string literal, we don't want the rest of the file to become marked as
a string, but we guess that the string will be ended in the same form
that it was started in.

I said we do "somewhat incremental" parsing because we are using
coarser granularity than possible (i.e. at the form level). We found
it's fast enough and the additional work isn't worth it. The parser
would need deeper changes and I don't like having to maintain a
parallel lexer and parser. The code is at
https://github.com/erlide/erlide/tree/master/org.erlide.kernel.ide/src/,
in erlide_scan.erl and the parser/ directory, hopefully it's
understandable. If not and you want to know more, please ask.

regards,
Vlad