[erlang-questions] Parsing C with leex and yecc

Tue Jul 20 16:01:29 CEST 2010

If you use the comment regexp Sverker sent for a leex comment token
(which you then ignore at will) together with a C string regexp for a
string token then you should not have any problems. The comment regexp
will not match inside the string regexp and vice versa. You will only
get problems if you try to use it remove comments in a pre-parse.

The typedef problem is difficult. Anyway "real" C didn't have typedef's. :-)

Robert

On 20 July 2010 15:00, Joe Armstrong <erlang@REDACTED> wrote:
> On Tue, Jul 20, 2010 at 12:32 PM, Sverker Eriksson
> <sverker@REDACTED> wrote:
>> Joe Armstrong wrote:
>>>
>>> I'm trying to parser ANSI C with leex and yecc and have run into
>>> two problems.
>>>
>>> 1) /* ... */ comments. Leex is (as I understand things) greedy
>>>   thus I can't just write a regexp to match comments, since it
>>>   will consume no-only the current comment, but all comments until
>>>   the last comment in the file.
>>>
>>>   To solve this I have just written a simple pre-processor to remove
>>> comments
>>>   from the original source.
>>>
>>>
>>
>> re:run("/***first comment***/ /* next comment */",
>> "/\\*([^*]|(\\*+([^*/])))*\\*+/").
>>
>> http://ostermiller.org/findcomment.html
>
>  Ummm ... this will incorrectly match a literal string, containing a
> comment  ... for example:
>
>      char *p = "hi /* not a comment */ how are you?";
>
>  Which is not what I want ..
>
>  Easiest seems to be  bit of pure Erlang:
>
>
>
>
> On Tue, Jul 20, 2010 at 12:32 PM, Sverker Eriksson
> <sverker@REDACTED> wrote:
>> Joe Armstrong wrote:
>>>
>>> I'm trying to parser ANSI C with leex and yecc and have run into
>>> two problems.
>>>
>>> 1) /* ... */ comments. Leex is (as I understand things) greedy
>>>   thus I can't just write a regexp to match comments, since it
>>>   will consume no-only the current comment, but all comments until
>>>   the last comment in the file.
>>>
>>>   To solve this I have just written a simple pre-processor to remove
>>> comments
>>>   from the original source.
>>>
>>>
>>
>> re:run("/***first comment***/ /* next comment */",
>> "/\\*([^*]|(\\*+([^*/])))*\\*+/").
>>
>> http://ostermiller.org/findcomment.html
>
>  Ummm ... this will incorrectly match a literal string, containing a
> comment, for example:
>
>      char *p = "hi /* not a comment */ how are you?";
>
>  Which is not what I want ..
>
>  Easiest seems to be  bit of pure Erlang:
>
> %% remove_comments(Str) -> Str'
> %%    remove C'style comments from a string
> %%    note1: We retain any embedded NLs in the comment
> %%           this is so that line number calculations in the tokenizer
> %%           will still be correct
> %%    note2. We copy literal strings. Since a literal string might
> %%           contain a comment we have to parse the string
> %%    note3: Comments must be replaced by at least one space
> %%           Since otherwise "123/* comment */456" would be
> %%           transformed into 123456 (a single integer) instead
> %%           of two integers 123 and 456. This is why we add a
> %%           space in the last line of skip_comment/2.
>
> remove_comments(Str) -> remove_comments(Str, []).
>
> remove_comments("/*" ++ T, L) -> skip_comment(T, L);
> remove_comments([$"|T], L)    -> copy_string_literal(T, [$"|L]);
> remove_comments([H|T], L)     -> remove_comments(T, [H|L]);
> remove_comments([], L)        -> reverse(L).
>
> skip_comment("*/" ++ T, L)   -> remove_comments(T, L);
> skip_comment("\n" ++ T, L)   -> skip_comment(T, [$\n|L]);
> skip_comment([_|T], L)       -> skip_comment(T, L);
> skip_comment([], L)          -> remove_comments([], [$\s|L]).
>
> copy_string_literal([$\\,$"|T], L) -> copy_string_literal(T, [$",$\\|L]);
> copy_string_literal([$"|T], L)     -> copy_string_literal(T, [$"|L]);
> copy_string_literal([H|T], L)      -> copy_string_literal(T, [H|L]);
> copy_string_literal([], L)         -> copy([], L).
>
> /Joe
>
>>
>> /Sverker
>>
>>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>