<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2800.1589" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><SPAN class=590080607-26042007><FONT face=Arial color=#0000ff size=2>...
actually, what I ended up doing in CCviewer was to collect
whitespace</FONT></SPAN></DIV>
<DIV><SPAN class=590080607-26042007><FONT face=Arial color=#0000ff size=2>and
comments in a list between each real token. Thus, the token
stream</FONT></SPAN></DIV>
<DIV><SPAN class=590080607-26042007><FONT face=Arial color=#0000ff size=2>became
</FONT></SPAN></DIV>
<DIV><SPAN class=590080607-26042007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=590080607-26042007><FONT face=Arial color=#0000ff size=2>[Tok1,
Whitespace1, Tok2, Whitespace2 | ...]</FONT></SPAN></DIV>
<DIV><SPAN class=590080607-26042007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=590080607-26042007><FONT face=Arial color=#0000ff size=2>That
wasn't too bad, but to make it a bit more interesting, I also
wanted</FONT></SPAN></DIV>
<DIV><SPAN class=590080607-26042007></SPAN><SPAN
class=590080607-26042007></SPAN><FONT face=Arial><FONT color=#0000ff><FONT
size=2>n<SPAN class=590080607-26042007>ot only to preserve formatting, but also
do a decent job on code that </SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007></SPAN></FONT></FONT></FONT><SPAN
class=590080607-26042007></SPAN><FONT face=Arial><FONT color=#0000ff><FONT
size=2>m<SPAN class=590080607-26042007>ight not compile (I would not do that
again, though...)</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>When I first wrote the html:izer, I was into
experimenting with doing the</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>brunt of the work in function head patterns. This led
to various problems</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>as well, but was a good learning
experience.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>Here's an example of what it could look like. The
purpose was to convert</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>to HTML with hypertext links on function calls,
function heads (that would</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>list the callers of the function), record- and macro
references.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>(If the pretty-printer got confused, it would throw an
exception, and plain,</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>un-annotated text would be displayed
instead.)</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>expr1([_T1={symbol, L1, C1, Ce1, '#'},
WC1,<BR> _T2={symbol, L2, C2, Ce2,
'?'}, WC2,<BR>
_T3={Tag, L3, C3, Ce3, W},
WC3,<BR> _T4={symbol, L4, C4, Ce4,
'.'}|Ts]?Xs, Cur, L, Term,<BR> XRefs, FF, FA, S)
when ?w_or_a(Tag) -><BR> %% Hellish combination of macro
expansion and record selector syntax<BR> %% We'd like to
hypertext link both, but can't do that. Since we don't<BR> %%
expand the macro, we'll create a hypertext reference to the
macro.<BR> %% We also have to consume the dot in order to keep
the parser from <BR> %% derailing.<BR> {Ref,
Link} =<BR> case
ets:lookup(S#state.tab, {define, W})
of<BR> []
-><BR>
%%
hmmm...<BR>
Ref1 = {mfa, S#state.modulename, W, ?macro_arity_int,
L3},<BR>
FL1 = funlink(W, ?macro_arity_int,
W),<BR>
{Ref1,
FL1};<BR> [{_,
F, _, IncMod, 0}]
-><BR>
Ref2 = {mfa, IncMod, W, ?macro_arity_int,
L3},<BR>
FL2 = macrolink(F, W,
S),<BR>
{Ref2, FL2}<BR>
end,<BR> Out = [space(L, Cur, L1, C1,
S),<BR>
"#",
wc(WC1,L1,Ce1,L2,C2,
S),<BR>
"?",
wc(WC2,L2,Ce2,L3,C3,
S),<BR>
Link,
wc(WC3,L3,Ce3,L4,C4,
S),<BR>
"."],<BR> S1 = out(Out, S),<BR> expr(Ts,
Ce4, L4, Term, [Ref|XRefs], FF, FA, S1
?LL);<BR></SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>BR,</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007>Ulf W</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=590080607-26042007> </DIV></SPAN></FONT></FONT></FONT>
<DIV><FONT face=Arial color=#0000ff size=2></FONT><BR></DIV>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> erlang-questions-bounces@erlang.org
[mailto:erlang-questions-bounces@erlang.org] <B>On Behalf Of </B>Ulf
Wiger<BR><B>Sent:</B> den 25 april 2007 18:35<BR><B>To:</B> Joe
Armstrong<BR><B>Cc:</B> Erlang<BR><B>Subject:</B> Re: [erlang-questions]
semantic tagger<BR></FONT><BR></DIV>
<DIV></DIV><BR>I did it to some extent in CCviewer, but I wouldn't recommend
reusing the code...<BR><BR>I think that for starters, the token scanner needs
to preserve column information.<BR>I think the standard tokenizer should have
an option to do this. <BR><BR>BR,<BR>Ulf W<BR><BR>
<DIV><SPAN class=gmail_quote>2007/4/25, Joe Armstrong <<A
href="mailto:erlang@gmail.com">erlang@gmail.com</A>>:</SPAN>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">Hello,
has anybody got a "semantic tagger" that can tag Erlang source<BR>code
files?<BR><BR>I want to convert a .erl file into a sequence of
tokens<BR><BR>[{Tag, String}]<BR><BR>where Tag is a semantic tag (like
comment, variable, atom, functionCall, etc.) <BR>that tags the following
string.<BR><BR>Constraint: If I concatenate all the strings in token list I
should get the<BR>original file content. I want to preserve all input
formatting.<BR><BR>Has anybody done
this?<BR><BR>/Joe<BR>_______________________________________________<BR>erlang-questions
mailing list<BR><A
href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</A><BR><A
href="http://www.erlang.org/mailman/listinfo/erlang-questions">http://www.erlang.org/mailman/listinfo/erlang-questions</A><BR></BLOCKQUOTE></DIV><BR></BLOCKQUOTE></BODY></HTML>