[erlang-questions] Parsing C with leex and yecc
Richard O'Keefe
ok@REDACTED
Fri Jul 23 01:44:55 CEST 2010
On Jul 23, 2010, at 6:00 AM, Tony Finch wrote:
>> /\/\*([^*]*\*+[^/*])*[^*]*\*+\//
>
> This is sadly incomplete ("sadly" because it's a good example of C's
> syntactic unpleasantness) unless you have already performed translation
> phases 1 and 2 on the source. In phase one you translate trigraphs (in
> particular ??/ -> \) and in phase two you delete backslash-newline
> sequences (which might not be visible until after trigraph substitution).
> Comments are recognized in phase three.
Exactly so. Which is why I suggested using a C preprocessor to do
the heavy lifting. My "favourite" C comment is
/??/ => /\ => /**/
*??/ *\
*??/ *\
/ /
trigraphs backslash-newline removal
gcc actually gets this completely wrong unless you pass -trigraph
on the command line.
>
> 2. Each instance of a backslash character (\) immediately followed by
> a new-line character is deleted, splicing physical source lines to
> form logical source lines. Only the last backslash on any physical
> source line shall be eligible for being part of such a splice. A
> source file that is not empty shall end in a new-line character,
> which shall not be immediately preceded by a backslash character
> before any such splicing takes place.
It's amazing how many Windows C programs violate this rule,
the file ending with say
}<EOF>
and it's even more amazing that some compilers fail to diagnose this.
To adapt something Padlipski wrote,
"There's a rumour that they're moving to 17 phases in the
next standard, because 17 is a sacred number in Bali."
More information about the erlang-questions
mailing list