[erlang-questions] Adoption of perl/javascript-style regexp syntax
Kevin Scaldeferri
kevin@REDACTED
Wed Jun 3 20:53:59 CEST 2009
On Jun 2, 2009, at 11:02 PM, Richard O'Keefe wrote:
>
> On 3 Jun 2009, at 5:09 pm, Kevin Scaldeferri wrote:
>>
>> So, you'd like it to be Perl, then?
>
> No, absolutely the complete and total reverse.
>
> Perl is a horrible example of how to do it WRONG.
>
> Perl doesn't do any of the things I have stressed as important.
> In fact, it goes about as far as it can in the opposite direction.
I shamefully admit to mostly making the comment to get a rise out of
you, but in fact the things you were asking for that I responded to:
> There are several stages in the compilation of a regular
> expression, at least notionally:
> linear representation -> AST
> AST -> matching engine
> It's good that Reia has a clue about regular expression literals
> (as AWK did). It would be even better if it _also_ provided an
> API for the AST, so that one could say
> "I want that regular expression followed by this string
> followed by that regular expression."
> It would be nice to tag the matches one wants with atoms rather
> than invisible integers. And so on.
are provided by perl. Regexs in Perl are in no way strings. They are
AST-like data structures, created during the compile phase, which are
fed into the matching engine during the run phase. (Yes, you can
coerce a string into a regex at runtime, but that's not the general
behavior of the built-in syntax.)
You can compose regex objects in just the way you describe:
$re1 = qr/abc/
$re2 = qr/123/
$re3 = qr/${re1}asdf${re2}/
You can have a named capture:
/\w*\.(?<suffix>\w)/
>>
> I just want to be able to compute them *AS* regular expression
> values, not as strings. It's like XML: if you want to build
> XML, strings are a horrible way to do it.
Well, fortunately regexes in perl are only strings insomuch as any
source code construct is a string.
>
>
> Take a real example from an AWK script.
>
> /^[a-zA-Z][a-zA-Z0-9.]*[ ]*<-[ ]*function/
>
> I'd like to be able to write this:
>
> opt_space() -> {star,{cset," \t"}}.
>
> letters() -> "a-zA-Z".
>
> continuers() -> "0-9." ++ letters().
>
> identifier() -> {seq,{cset,letters()},{star,{cset,continuers()}}}.
>
> operator(X) -> {seq,opt_space(),X,opt_space()}.
>
> pattern() ->
> {seq,bol,identifier(),operator("<-"),"function"}.
>
> It's longer, but in a complete program, I'm likely to have a use
> for most of these bits, and I am _certainly_ going to find it
> easier to get this right one step at a time.
>
> Do you see any Perl here? I don't.
>
$opt_space = qr/[ \t]*/;
$letters = qr/[a-zA-Z]/;
$continuers = qr/[0-9\.]|${letters}/;
$identifier = qr/${letters}${continuers}*/;
sub operator {
$x = shift;
qr/${opt_space}${x}${opt_space}/;
}
$larrow = operator("<-");
$pattern = qr/^${identifier}${larrow}function/;
So, other than your opposition to a concise syntax, I'm not really
seeing your point.
-kevin
More information about the erlang-questions
mailing list