[erlang-questions] regexp module

Liam Clarke ml.cyresse@REDACTED
Thu Oct 11 11:25:05 CEST 2007


On 10/11/07, Tim Bray <Tim.Bray@REDACTED> wrote:
>
> On Oct 10, 2007, at 3:18 AM, Liam Clarke wrote:
>
> > Hi all,
> >
> > Quick question, is there a way to do case insensitive matches with the
> > regexp module? I've hacked together a function to lower case
> > everything and go from there, but I just got that 'reinventing the
> > wheel' feeling that I get when in the early stages of familiarity with
> > a language and its libraries.
>
> If you're going to have to handle internationalized text in the
> general case, it's probably better to stay away from case-folding.
> The rules are incredibly locale-sensitive and language-sensitive.
> The java.String.toLower() call is insanely slow because it tries to
> deal with all these corner cases.  If you're in ASCII, you're OK, but
> even ISO-Latin-1 gets into trouble?  For example, what's the lower-
> case of "I"?  It's different in Turkey.  -Tim

Thanks for the heads up, I'll bear that in mind. it's HTTP headers, so
i18n isn't a concern at this point. Unless they've snuck something
sneaky into the HTTP spec. >_<

>A bit of a pain, but you could do ...
>
>1> regexp:matches("This IS the string, isn't it?", "[Ii][Ss]").
>{match,[{3,2},{6,2},{21,2}]}
>2>
>
>I force some fields to all upper case myself for fixed length user
>input data.
>
>~Michael

Oh yeah. /me smacks forehead. I could do it like that. I was all hung
up on having some kind of IGNORECASE flag.

Thanks,

Liam



More information about the erlang-questions mailing list