[erlang-questions] Parsing C with leex and yecc

Richard O'Keefe ok@REDACTED
Wed Jul 21 00:20:10 CEST 2010


On Jul 20, 2010, at 9:28 PM, Joe Armstrong wrote:

> I'm trying to parser ANSI C with leex and yecc and have run into
> two problems.
>
> 1) /* ... */ comments. Leex is (as I understand things) greedy
>   thus I can't just write a regexp to match comments, since it
>   will consume no-only the current comment, but all comments until
>   the last comment in the file.

Actually you CAN write a regular expression which matches
C comments.  The trivial /[/][*].*[*][/]/ is not going to work,
but it isn't particularly difficult to write a regular expression
that WILL work.  In fact, some books about Lex and Yacc give it
to you.  Oh heck.  I'm not going to leave it as an exercise for
the reader after all.  Think of a C comment as
	"/*"
	zero or more blocks of (not star)* (star)+ (not star or slash)
	one block of (not star)* (star)+ /

	/\/\*([^*]*\*+[^/*])*[^*]*\*+\//

Lex books recommend NOT doing this, not because there's any great
difficulty in constructing a regular expression, but because
recognising a comment that way means *storing* the comment as if it
were a token.  If you want to keep the comments, that's a great
thing to do.  If you don't, you have to allocate a huge token
buffer you wouldn't otherwise need.

There's another approach in (f)lex, which is to use states.
Does Leex support those?

However, there's another approach that might be worth considering.
Run the C files through the preprocessor first, and let *it*
strip out the comments.




More information about the erlang-questions mailing list