[erlang-questions] semantic tagger

Thu Apr 26 09:23:32 CEST 2007

... actually, what I ended up doing in CCviewer was to collect
whitespace
and comments in a list between each real token. Thus, the token stream
became 

[Tok1, Whitespace1, Tok2, Whitespace2 | ...]

That wasn't too bad, but to make it a bit more interesting, I also
wanted
not only to preserve formatting, but also do a decent job on code that 
might not compile (I would not do that again, though...)

When I first wrote the html:izer, I was into experimenting with doing
the
brunt of the work in function head patterns. This led to various
problems
as well, but was a good learning experience.

Here's an example of what it could look like. The purpose was to convert
to HTML with hypertext links on function calls, function heads (that
would
list the callers of the function), record- and macro references.

(If the pretty-printer got confused, it would throw an exception, and
plain,
un-annotated text would be displayed instead.)

expr1([_T1={symbol, L1, C1, Ce1, '#'},  WC1,
       _T2={symbol, L2, C2, Ce2, '?'},  WC2,
       _T3={Tag,    L3, C3, Ce3,  W},   WC3,
       _T4={symbol, L4, C4, Ce4, '.'}|Ts]?Xs, Cur, L, Term,
      XRefs, FF, FA, S) when ?w_or_a(Tag) ->
    %% Hellish combination of macro expansion and record selector syntax
    %% We'd like to hypertext link both, but can't do that. Since we
don't
    %% expand the macro, we'll create a hypertext reference to the
macro.
    %% We also have to consume the dot in order to keep the parser from 
    %% derailing.
    {Ref, Link} =
        case ets:lookup(S#state.tab, {define, W}) of
            [] ->
                %% hmmm...
                Ref1 = {mfa, S#state.modulename, W, ?macro_arity_int,
L3},
                FL1 = funlink(W, ?macro_arity_int, W),
                {Ref1, FL1};
            [{_, F, _, IncMod, 0}] ->
                Ref2 = {mfa, IncMod, W, ?macro_arity_int, L3},
                FL2 = macrolink(F, W, S),
                {Ref2, FL2}
        end,
    Out = [space(L, Cur, L1, C1, S),
           "#",                 wc(WC1,L1,Ce1,L2,C2, S),
           "?",                 wc(WC2,L2,Ce2,L3,C3, S),
           Link,                wc(WC3,L3,Ce3,L4,C4, S),
           "."],
    S1 = out(Out, S),
    expr(Ts, Ce4, L4, Term, [Ref|XRefs], FF, FA, S1 ?LL);

BR,
Ulf W

________________________________

	From: erlang-questions-bounces@REDACTED
[mailto:erlang-questions-bounces@REDACTED] On Behalf Of Ulf Wiger
	Sent: den 25 april 2007 18:35
	To: Joe Armstrong
	Cc: Erlang
	Subject: Re: [erlang-questions] semantic tagger

	I did it to some extent in CCviewer, but I wouldn't recommend
reusing the code...

	I think that for starters, the token scanner needs to preserve
column information.
	I think the standard tokenizer should have an option to do this.

	BR,
	Ulf W

	2007/4/25, Joe Armstrong <erlang@REDACTED>: 

		Hello, has anybody got a "semantic tagger" that can tag
Erlang source
		code files?

		I want to convert a .erl file into a sequence of tokens

		[{Tag, String}]

		where Tag is a semantic tag (like comment, variable,
atom, functionCall, etc.) 
		that tags the following string.

		Constraint: If I concatenate all the strings in token
list I should get the
		original file content. I want to preserve all input
formatting.

		Has anybody done this?

		/Joe
		_______________________________________________
		erlang-questions mailing list
		erlang-questions@REDACTED
		http://www.erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070426/a37bd929/attachment.htm>