regexp to matchspec

Ulf Wiger (AL/EAB) ulf.wiger@REDACTED
Mon Jan 23 14:46:08 CET 2006


I have written a small module that attempts 
to convert a regexp to a match spec -- one 
of those dumb ideas that just wouldn't leave 
me alone until I had tried it. (:

Of course, the conversion leaves something 
to be desired, since you can't do recursion
within a match spec, but my approach works 
within reason (it will explode in your face 
if you try something too complex.)

Example:

re2ms:re(Regexp, Lookahead) converts the Regexp
to a match specification, using Lookahead (integer())
to limit the depth of the guard tree.


30> Match = fun(Objs,Ms) -> 
      MsC = ets:match_spec_compile(Ms),
      ets:match_spec_run(Objs,MsC) end.
#Fun<erl_eval.12.2225172>
31> f(Re), Re = re2ms:re("\(abc\)+a*b*", 16).
[{['$1'|'$2'],
  [{'andalso',{'andalso',{'andalso',{'=/=','$1',[]},{'==','$1',97}},
                         {'andalso',{'andalso',
                                        {'=/=','$2',[]},
                                        {'==',{hd,'$2'},98}},
                                    {'andalso',
                                        {'andalso',
                                            {'=/=',{tl,'$2'},[]},
                                            {'==',{hd,{tl,'$2'}},99}},
                                        true}}},
              {'orelse',{'andalso',{'andalso',
                                       {'andalso',
  ... % really big and ugly match spec

32> Match(["abc","aaa","bbb","abcabca","abab","abcd"],Re).
["abc","abcabca","abcd"]
34> f(Re), Re = re2ms:re(".*\\.erl", 16).
[{'$1',[{'orelse',{'andalso',{'andalso',
                                 {'andalso',
...
35> Match(["a.erl","aaaaaa.erl","aaaaaaaaaaaaaaaaaaaa.erl"],Re).
["a.erl","aaaaaa.erl"]

...which illustrates the lookahead limitation.


Since debugging a huge match expression by
just passing it to ets:match_spec_run is ...
frustrating, I wrote an incomplete match spec
evaluator (re2ms:run_ms(Objs, Ms)). To play around
with extensions to the match spec grammar that 
might better suit regexp-style patterns, I added
'let' and 'subterm' (my possibly confursed
interpretation of a suggestion made by John Hughes.)

28> re2ms:run_ms(
 ["aba","abb","abc",
  "abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"],
 [{"a"++'$1', 
   [{'==',{subterm,'$1','tl',
           {'==',{'hd','$_'},98},
           {'==','$_',[]}},
    []}],
   ['$_']}]).
["abb","abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"]

29> re2ms:run_ms(
 ["aba","abb","abc",
  "abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"],
 [{"a"++'$1', 
   [{'let','$2',
     {subterm,'$1','tl',
      {'==',{'hd','$_'},98},
      {'==','$_',[]}},
    {'==','$2',[]}}],
   ['$_']}]).
["abb","abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"]

In other words:

{'let', NewVar, Expr, In}

and 

{subterm, StartingExpr, RecursiveOp, While, Until}

'$_' is used within the While and Until guards
to refer to the "current value".

value_of({subterm, Expr, RecurseOp, While, Until}, Vars) ->
    Recurse = 
	case RecurseOp of
	    'tl' -> {'tl', '$_'};
	    {element, Pos} when is_integer(Pos), Pos > 0 -> 
		{element, Pos, '$_'};
	    _ ->
		erlang:error(badarg)
	end,
    CurVal = value_of(Expr, Vars),
    subterm(Recurse, While, Until, CurVal, Vars);

subterm(Recurse, While, Until, CurVal, Vars) ->
    Vars1 = [{'$_',CurVal}|Vars],
    case guard(Until, Vars1) of
	false ->
	    case guard(While, Vars1) of
		true ->
		    NewVal = value_of(Recurse, Vars1),
		    subterm(Recurse, While, Until, NewVal, Vars);
		false ->
		    CurVal
	    end;
	true ->
	    CurVal
    end.

The source code is attached. It should be small enough
to make it through. I think it might be really valuable
to have a complete erlang-based match spec evaluator,
but completing it will not be a high priority of mine.

Comments? Suggestions?

/Uffe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: re2ms.erl
Type: application/octet-stream
Size: 9400 bytes
Desc: re2ms.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20060123/3f4cab79/attachment.obj>


More information about the erlang-questions mailing list