recursive-descent parsing (was: Re: SciTE (was: Re: not a question, just a fun thing with syntax_tools))

Fri May 23 19:20:03 CEST 2003

On Fri, 23 May 2003, Peter-Henry Mander wrote:

>The lexer colours the source file, but the folding
>algorithm is a bit primitive. It folds "case", "fun", "if"
>and "receive" statements, but I would also like it to fold
>function clauses too, which will require a better lexer and
>syntax analyser. I wish to use
>http://spirit.sourceforge.net which is recursive descent,
>easy to understand, and thanks to C++ templates I can
>practically copy the Erlang EBNF spec straight into C++
>code. Nice.

I started making a recursive-descent version of yecc a while
back. The idea was that the generated code would be pretty
enough that you'd want to read it.

It was never finished. If anyone thinks the idea is
interesting, you can have my code and run with it.
I started with the erlang grammar for my tests. This was a
mistake, I think, since there is a lot of code needed
besides the generated code to e.g. compile a .erl file.

Anyway...

The following yecc clause:

case_expr -> 'case' expr 'of' cr_clauses 'end' :
   {'case', element(2, '$1'), '$2', '$4'}.

currently expands to this in the generated code:

case_expr(Ts, Match, Skip, S) ->
   match_rule(['case', 'expr', 'of', 'cr_clauses', 'end'], Ts,
              fun(Result, Ts1, S1) ->
                    X = {'case',element(2,'?v1'),'?v2','?v4'},
                    Match(X, Ts1, S1)
              end, Skip, S).

Where Match is a dynamically created continuation.

A clause with multiple branches:

receive_expr -> 'receive' 'after' expr clause_body 'end' :
   {'receive', element(2, '$1'), [], '$3', '$4'}.
receive_expr -> 'receive' cr_clauses 'end' :
   {'receive', element(2, '$1'), '$2'}.
receive_expr -> 'receive' cr_clauses 'after' expr
clause_body 'end' :
   {'receive', element(2, '$1'), '$2', '$4', '$5'}.

Translates to this:

receive_expr(Ts, Match, Skip, S) ->
   match_rules(
       [
        {['receive', 'cr_clauses', 'after', 'expr',
          'clause_body', 'end'],
         fun(Result, Ts1, S1) ->
               X = {'receive',element(2,'?v1'),
                    '?v2','?v4','?v5'},
               Match(X, Ts1, S1)
         end},
        {['receive', 'after', 'expr', 'clause_body',
          'end'],
         fun(Result, Ts1, S1) ->
               X = {'receive',element(2,'?v1'),
                    [],'?v3','?v4'},
               Match(X, Ts1, S1)
         end},
        {['receive', 'cr_clauses', 'end'],
         fun(Result, Ts1, S1) ->
               X = {'receive',element(2,'?v1'),'?v2'},
               Match(X, Ts1, S1)
         end}
       ], Ts, Skip, S).

The idea was to maintain the structure of the grammar as far
as possible, so that it would not be much harder to write
the parser code by hand than to write the actual yecc
grammar (the code generated from esyntax.yrl has 2.6 times
more lines of code than esyntax.yrl does; compared to the
normal yecc output, which expands it >17 times, it's a fair
improvement in that regard.) Also, the parser still operates
on one token at a time.

Another idea was that grammars like the SQL grammar and
megaco_text_parser are really pushing the limits of what's
feasible with the current yecc (my opinion.)

Enough rambling. I may be barking up the wrong tree, but if
anyone wants to try and finish it, you can have the code.

/Uffe
-- 
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson AB, Connectivity and Control Nodes