[erlang-questions] re: re:pain (and stripping whitespace from text)

Jayson Vantuyl <>
Sun Mar 14 03:45:17 CET 2010


Actually, using pure regular expressions, this is not really easy (or exactly possible for certain uses, i.e. nesting).

> "some quotes", with, "some more quotes", and, yet, "even more quotes"

That said, many regex libraries have extensions that make this possible, but painful.  See here:

Perl: http://perldoc.perl.org/perlfaq6.html#Can-I-use-Perl-regular-expressions-to-match-balanced-text%3f (also "recursive patterns")
PCRE:  http://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions (look for "recursive patterns")
.NET: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

Also, note that backtracking and recursion can create very undesirable behavior in terms of stack usage and execution time.  This is so problematic that Google has written a special regex library that uses research in automata theory to make it behave better.  See here:

RE2:  http://code.google.com/p/re2/

I think that the moral of this story is that every way to do this with regexps is a hack, and you probably shouldn't.  It looks like you're parsing CSV data.  You might try this relatively simple recursive-decent parser:

http://ppolv.wordpress.com/2008/02/25/parsing-csv-in-erlang/

I'm willing to bet that it's not particularly fast, but probably works well enough.  If you need more speed, you might try implementing a parser with yecc/leex, or even do something really exciting like writing a erl_nif interface to libcsv.

Good luck!

On Mar 13, 2010, at 2:35 PM, Steve Davis wrote:

> I've been confounded again by re, trying to strip whitespace from
> binary text, as the obvious "[ \t\r\n]+", as in...
> 
> list_to_binary(re:replace(<<"a, \tb, \"quoted string\", \n c, d">>,
> <<"[ \t\r\n]+">>, <<>>, [global]))
> 
> ...results in...
> 
> <<"a,b,\"quotedstring\",c,d">>.
> 
> ..I know there must be a regex that would avoid the stripping inside
> the quotes, but no amount of experiment (or google) has yielded a
> suitable result for me.
> 
> Can anybody immediately see a solution (and put me out of my pain)?
> 
> Thanks in advance,
> /s
> 
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:
> 

-- 
Jayson Vantuyl




More information about the erlang-questions mailing list