[erlang-questions] Parsing C with leex and yecc

Richard O'Keefe <>
Fri Jul 23 01:44:55 CEST 2010

On Jul 23, 2010, at 6:00 AM, Tony Finch wrote:
>> 	/\/\*([^*]*\*+[^/*])*[^*]*\*+\//
> This is sadly incomplete ("sadly" because it's a good example of C's
> syntactic unpleasantness) unless you have already performed translation
> phases 1 and 2 on the source. In phase one you translate trigraphs (in
> particular ??/ -> \) and in phase two you delete backslash-newline
> sequences (which might not be visible until after trigraph substitution).
> Comments are recognized in phase three.

Exactly so.  Which is why I suggested using a C preprocessor to do
the heavy lifting.  My "favourite" C comment is

	/??/	=> /\   =>	/**/
	*??/	   *\
	*??/	   *\
	/	   /
		trigraphs	backslash-newline removal

gcc actually gets this completely wrong unless you pass -trigraph
on the command line.

>   2. Each instance of a backslash character (\) immediately followed by
>      a new-line character is deleted, splicing physical source lines to
>      form logical source lines. Only the last backslash on any physical
>      source line shall be eligible for being part of such a splice. A
>      source file that is not empty shall end in a new-line character,
>      which shall not be immediately preceded by a backslash character
>      before any such splicing takes place.

It's amazing how many Windows C programs violate this rule,
the file ending with say
and it's even more amazing that some compilers fail to diagnose this.

To adapt something Padlipski wrote,
	"There's a rumour that they're moving to 17 phases in the
	next standard, because 17 is a sacred number in Bali."

More information about the erlang-questions mailing list