regexp to matchspec

Ulf Wiger (AL/EAB) ulf.wiger@REDACTED
Mon Jan 23 14:46:08 CET 2006

I have written a small module that attempts 
to convert a regexp to a match spec -- one 
of those dumb ideas that just wouldn't leave 
me alone until I had tried it. (:

Of course, the conversion leaves something 
to be desired, since you can't do recursion
within a match spec, but my approach works 
within reason (it will explode in your face 
if you try something too complex.)


re2ms:re(Regexp, Lookahead) converts the Regexp
to a match specification, using Lookahead (integer())
to limit the depth of the guard tree.

30> Match = fun(Objs,Ms) -> 
      MsC = ets:match_spec_compile(Ms),
      ets:match_spec_run(Objs,MsC) end.
31> f(Re), Re = re2ms:re("\(abc\)+a*b*", 16).
  ... % really big and ugly match spec

32> Match(["abc","aaa","bbb","abcabca","abab","abcd"],Re).
34> f(Re), Re = re2ms:re(".*\\.erl", 16).
35> Match(["a.erl","aaaaaa.erl","aaaaaaaaaaaaaaaaaaaa.erl"],Re).

...which illustrates the lookahead limitation.

Since debugging a huge match expression by
just passing it to ets:match_spec_run is ...
frustrating, I wrote an incomplete match spec
evaluator (re2ms:run_ms(Objs, Ms)). To play around
with extensions to the match spec grammar that 
might better suit regexp-style patterns, I added
'let' and 'subterm' (my possibly confursed
interpretation of a suggestion made by John Hughes.)

28> re2ms:run_ms(

29> re2ms:run_ms(

In other words:

{'let', NewVar, Expr, In}


{subterm, StartingExpr, RecursiveOp, While, Until}

'$_' is used within the While and Until guards
to refer to the "current value".

value_of({subterm, Expr, RecurseOp, While, Until}, Vars) ->
    Recurse = 
	case RecurseOp of
	    'tl' -> {'tl', '$_'};
	    {element, Pos} when is_integer(Pos), Pos > 0 -> 
		{element, Pos, '$_'};
	    _ ->
    CurVal = value_of(Expr, Vars),
    subterm(Recurse, While, Until, CurVal, Vars);

subterm(Recurse, While, Until, CurVal, Vars) ->
    Vars1 = [{'$_',CurVal}|Vars],
    case guard(Until, Vars1) of
	false ->
	    case guard(While, Vars1) of
		true ->
		    NewVal = value_of(Recurse, Vars1),
		    subterm(Recurse, While, Until, NewVal, Vars);
		false ->
	true ->

The source code is attached. It should be small enough
to make it through. I think it might be really valuable
to have a complete erlang-based match spec evaluator,
but completing it will not be a high priority of mine.

Comments? Suggestions?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: re2ms.erl
Type: application/octet-stream
Size: 9400 bytes
Desc: re2ms.erl
URL: <>

More information about the erlang-questions mailing list