[erlang-questions] String Pattern Matching

Richard A. O'Keefe ok@REDACTED
Mon Sep 8 08:16:01 CEST 2008


On 8 Sep 2008, at 2:02 am, Luke Galea wrote:
> For instance:
> 	([m|l])ouse$ -> $1ice
> or
> 	([^aeiouy]|qu)y$ -> $1ies
>
> So, given that erlang has insufficient regular expression support to
> tackle this (and given that I want to do this the "erlang way"), how
> does one go about doing this?

Key insight:  PROGRAMS ARE DATA.

This means that you can compute a program (or a module);
you don't have to write it by hand.  Given rules of the
form
	<pattern>$ -> <replacement>
you can write a fairly straightforward program (I'd probably
do it in AWK, to be honest) to turn that into Erlang like

	replace("esuom"++X) -> rev(X, "mice");
	replace("esoul"++X) -> rev(X, "lice");
	replace("yuq"  ++X) -> rev(X, "ies");
	replace("y"++[C|X]) when (C =\= $a, C =\= $e, ...
			          C =\= $y)
			    -> rev(X, [C|"ies"]);
	...

	rev([X|Xs], R) -> rev(Xs, [X|R]);
	rev([],     R) -> R.

	rewrite(S) -> replace(rev(S, [])).

(Think "trie".)  Yes, you definitely DON'T want to write this
kind of code by hand.  But you also do NOT want to write it as
a whole bunch of regular expression matches either.

I first encountered this idea when I tried to write a morphological
analyser for Latin in Prolog, many years ago.  Too tedious to write
by hand, but simple to write a reverse/match+replace/reverse
compiler for.

> The bit syntax won't work on arbitrary length fields as far as I
> know. And I *really* don't want to invent my own subset/alternate
> syntax of regular expressions that is more than what erlang supports
> and less than a PCRE.

I suggest that you really DO want to invent your own subset of
regular expressions that is just powerful enough to do the job.
Maybe something like alternation, concatenation, literal, and
set will be enough.  Then you compile that to Erlang.




More information about the erlang-questions mailing list