[erlang-questions] Parsing C with leex and yecc
Joe Armstrong
erlang@REDACTED
Tue Jul 20 15:00:09 CEST 2010
On Tue, Jul 20, 2010 at 12:32 PM, Sverker Eriksson
<sverker@REDACTED> wrote:
> Joe Armstrong wrote:
>>
>> I'm trying to parser ANSI C with leex and yecc and have run into
>> two problems.
>>
>> 1) /* ... */ comments. Leex is (as I understand things) greedy
>> thus I can't just write a regexp to match comments, since it
>> will consume no-only the current comment, but all comments until
>> the last comment in the file.
>>
>> To solve this I have just written a simple pre-processor to remove
>> comments
>> from the original source.
>>
>>
>
> re:run("/***first comment***/ /* next comment */",
> "/\\*([^*]|(\\*+([^*/])))*\\*+/").
>
> http://ostermiller.org/findcomment.html
Ummm ... this will incorrectly match a literal string, containing a
comment ... for example:
char *p = "hi /* not a comment */ how are you?";
Which is not what I want ..
Easiest seems to be bit of pure Erlang:
On Tue, Jul 20, 2010 at 12:32 PM, Sverker Eriksson
<sverker@REDACTED> wrote:
> Joe Armstrong wrote:
>>
>> I'm trying to parser ANSI C with leex and yecc and have run into
>> two problems.
>>
>> 1) /* ... */ comments. Leex is (as I understand things) greedy
>> thus I can't just write a regexp to match comments, since it
>> will consume no-only the current comment, but all comments until
>> the last comment in the file.
>>
>> To solve this I have just written a simple pre-processor to remove
>> comments
>> from the original source.
>>
>>
>
> re:run("/***first comment***/ /* next comment */",
> "/\\*([^*]|(\\*+([^*/])))*\\*+/").
>
> http://ostermiller.org/findcomment.html
Ummm ... this will incorrectly match a literal string, containing a
comment, for example:
char *p = "hi /* not a comment */ how are you?";
Which is not what I want ..
Easiest seems to be bit of pure Erlang:
%% remove_comments(Str) -> Str'
%% remove C'style comments from a string
%% note1: We retain any embedded NLs in the comment
%% this is so that line number calculations in the tokenizer
%% will still be correct
%% note2. We copy literal strings. Since a literal string might
%% contain a comment we have to parse the string
%% note3: Comments must be replaced by at least one space
%% Since otherwise "123/* comment */456" would be
%% transformed into 123456 (a single integer) instead
%% of two integers 123 and 456. This is why we add a
%% space in the last line of skip_comment/2.
remove_comments(Str) -> remove_comments(Str, []).
remove_comments("/*" ++ T, L) -> skip_comment(T, L);
remove_comments([$"|T], L) -> copy_string_literal(T, [$"|L]);
remove_comments([H|T], L) -> remove_comments(T, [H|L]);
remove_comments([], L) -> reverse(L).
skip_comment("*/" ++ T, L) -> remove_comments(T, L);
skip_comment("\n" ++ T, L) -> skip_comment(T, [$\n|L]);
skip_comment([_|T], L) -> skip_comment(T, L);
skip_comment([], L) -> remove_comments([], [$\s|L]).
copy_string_literal([$\\,$"|T], L) -> copy_string_literal(T, [$",$\\|L]);
copy_string_literal([$"|T], L) -> copy_string_literal(T, [$"|L]);
copy_string_literal([H|T], L) -> copy_string_literal(T, [H|L]);
copy_string_literal([], L) -> copy([], L).
/Joe
>
> /Sverker
>
>
More information about the erlang-questions
mailing list