[erlang-questions] Adoption of perl/javascript-style regexp syntax

Wed Jun 3 20:53:59 CEST 2009

On Jun 2, 2009, at 11:02 PM, Richard O'Keefe wrote:

>
> On 3 Jun 2009, at 5:09 pm, Kevin Scaldeferri wrote:
>>
>> So, you'd like it to be Perl, then?
>
> No, absolutely the complete and total reverse.
>
> Perl is a horrible example of how to do it WRONG.
>
> Perl doesn't do any of the things I have stressed as important.
> In fact, it goes about as far as it can in the opposite direction.

I shamefully admit to mostly making the comment to get a rise out of  
you, but in fact the things you were asking for that I responded to:

> There are several stages in the compilation of a regular
> expression, at least notionally:
> 	linear representation -> AST
> 	AST -> matching engine
> It's good that Reia has a clue about regular expression literals
> (as AWK did).  It would be even better if it _also_ provided an
> API for the AST, so that one could say
> 	"I want that regular expression followed by this string
> 	 followed by that regular expression."
> It would be nice to tag the matches one wants with atoms rather
> than invisible integers.  And so on.

are provided by perl.  Regexs in Perl are in no way strings.  They are  
AST-like data structures, created during the compile phase, which are  
fed into the matching engine during the run phase.  (Yes, you can  
coerce a string into a regex at runtime, but that's not the general  
behavior of the built-in syntax.)

You can compose regex objects in just the way you describe:

$re1 = qr/abc/
$re2 = qr/123/
$re3 = qr/${re1}asdf${re2}/

You can have a named capture:

/\w*\.(?<suffix>\w)/

>>
> I just want to be able to compute them *AS* regular expression
> values, not as strings.  It's like XML:  if you want to build
> XML, strings are a horrible way to do it.

Well, fortunately regexes in perl are only strings insomuch as any  
source code construct is a string.

>
>
> Take a real example from an AWK script.
>
> /^[a-zA-Z][a-zA-Z0-9.]*[         ]*<-[   ]*function/
>
> I'd like to be able to write this:
>
> opt_space() -> {star,{cset," \t"}}.
>
> letters() -> "a-zA-Z".
>
> continuers() -> "0-9." ++ letters().
>
> identifier() -> {seq,{cset,letters()},{star,{cset,continuers()}}}.
>
> operator(X) -> {seq,opt_space(),X,opt_space()}.
>
> pattern() ->
>    {seq,bol,identifier(),operator("<-"),"function"}.
>
> It's longer, but in a complete program, I'm likely to have a use
> for most of these bits, and I am _certainly_ going to find it
> easier to get this right one step at a time.
>
> Do you see any Perl here?  I don't.
>

$opt_space = qr/[ \t]*/;
$letters = qr/[a-zA-Z]/;
$continuers = qr/[0-9\.]|${letters}/;
$identifier = qr/${letters}${continuers}*/;

sub operator {
   $x = shift;
   qr/${opt_space}${x}${opt_space}/;
}

$larrow = operator("<-");
$pattern = qr/^${identifier}${larrow}function/;

So, other than your opposition to a concise syntax, I'm not really  
seeing your point.

-kevin