regexp to matchspec
Ulf Wiger (AL/EAB)
ulf.wiger@REDACTED
Mon Jan 23 14:46:08 CET 2006
I have written a small module that attempts
to convert a regexp to a match spec -- one
of those dumb ideas that just wouldn't leave
me alone until I had tried it. (:
Of course, the conversion leaves something
to be desired, since you can't do recursion
within a match spec, but my approach works
within reason (it will explode in your face
if you try something too complex.)
Example:
re2ms:re(Regexp, Lookahead) converts the Regexp
to a match specification, using Lookahead (integer())
to limit the depth of the guard tree.
30> Match = fun(Objs,Ms) ->
MsC = ets:match_spec_compile(Ms),
ets:match_spec_run(Objs,MsC) end.
#Fun<erl_eval.12.2225172>
31> f(Re), Re = re2ms:re("\(abc\)+a*b*", 16).
[{['$1'|'$2'],
[{'andalso',{'andalso',{'andalso',{'=/=','$1',[]},{'==','$1',97}},
{'andalso',{'andalso',
{'=/=','$2',[]},
{'==',{hd,'$2'},98}},
{'andalso',
{'andalso',
{'=/=',{tl,'$2'},[]},
{'==',{hd,{tl,'$2'}},99}},
true}}},
{'orelse',{'andalso',{'andalso',
{'andalso',
... % really big and ugly match spec
32> Match(["abc","aaa","bbb","abcabca","abab","abcd"],Re).
["abc","abcabca","abcd"]
34> f(Re), Re = re2ms:re(".*\\.erl", 16).
[{'$1',[{'orelse',{'andalso',{'andalso',
{'andalso',
...
35> Match(["a.erl","aaaaaa.erl","aaaaaaaaaaaaaaaaaaaa.erl"],Re).
["a.erl","aaaaaa.erl"]
...which illustrates the lookahead limitation.
Since debugging a huge match expression by
just passing it to ets:match_spec_run is ...
frustrating, I wrote an incomplete match spec
evaluator (re2ms:run_ms(Objs, Ms)). To play around
with extensions to the match spec grammar that
might better suit regexp-style patterns, I added
'let' and 'subterm' (my possibly confursed
interpretation of a suggestion made by John Hughes.)
28> re2ms:run_ms(
["aba","abb","abc",
"abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"],
[{"a"++'$1',
[{'==',{subterm,'$1','tl',
{'==',{'hd','$_'},98},
{'==','$_',[]}},
[]}],
['$_']}]).
["abb","abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"]
29> re2ms:run_ms(
["aba","abb","abc",
"abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"],
[{"a"++'$1',
[{'let','$2',
{subterm,'$1','tl',
{'==',{'hd','$_'},98},
{'==','$_',[]}},
{'==','$2',[]}}],
['$_']}]).
["abb","abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"]
In other words:
{'let', NewVar, Expr, In}
and
{subterm, StartingExpr, RecursiveOp, While, Until}
'$_' is used within the While and Until guards
to refer to the "current value".
value_of({subterm, Expr, RecurseOp, While, Until}, Vars) ->
Recurse =
case RecurseOp of
'tl' -> {'tl', '$_'};
{element, Pos} when is_integer(Pos), Pos > 0 ->
{element, Pos, '$_'};
_ ->
erlang:error(badarg)
end,
CurVal = value_of(Expr, Vars),
subterm(Recurse, While, Until, CurVal, Vars);
subterm(Recurse, While, Until, CurVal, Vars) ->
Vars1 = [{'$_',CurVal}|Vars],
case guard(Until, Vars1) of
false ->
case guard(While, Vars1) of
true ->
NewVal = value_of(Recurse, Vars1),
subterm(Recurse, While, Until, NewVal, Vars);
false ->
CurVal
end;
true ->
CurVal
end.
The source code is attached. It should be small enough
to make it through. I think it might be really valuable
to have a complete erlang-based match spec evaluator,
but completing it will not be a high priority of mine.
Comments? Suggestions?
/Uffe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: re2ms.erl
Type: application/octet-stream
Size: 9400 bytes
Desc: re2ms.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20060123/3f4cab79/attachment.obj>
More information about the erlang-questions
mailing list