[erlang-questions] re: re:pain (and stripping whitespace from text)
Jayson Vantuyl
kagato@REDACTED
Sun Mar 14 03:45:17 CET 2010
Actually, using pure regular expressions, this is not really easy (or exactly possible for certain uses, i.e. nesting).
> "some quotes", with, "some more quotes", and, yet, "even more quotes"
That said, many regex libraries have extensions that make this possible, but painful. See here:
Perl: http://perldoc.perl.org/perlfaq6.html#Can-I-use-Perl-regular-expressions-to-match-balanced-text%3f (also "recursive patterns")
PCRE: http://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions (look for "recursive patterns")
.NET: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
Also, note that backtracking and recursion can create very undesirable behavior in terms of stack usage and execution time. This is so problematic that Google has written a special regex library that uses research in automata theory to make it behave better. See here:
RE2: http://code.google.com/p/re2/
I think that the moral of this story is that every way to do this with regexps is a hack, and you probably shouldn't. It looks like you're parsing CSV data. You might try this relatively simple recursive-decent parser:
http://ppolv.wordpress.com/2008/02/25/parsing-csv-in-erlang/
I'm willing to bet that it's not particularly fast, but probably works well enough. If you need more speed, you might try implementing a parser with yecc/leex, or even do something really exciting like writing a erl_nif interface to libcsv.
Good luck!
On Mar 13, 2010, at 2:35 PM, Steve Davis wrote:
> I've been confounded again by re, trying to strip whitespace from
> binary text, as the obvious "[ \t\r\n]+", as in...
>
> list_to_binary(re:replace(<<"a, \tb, \"quoted string\", \n c, d">>,
> <<"[ \t\r\n]+">>, <<>>, [global]))
>
> ...results in...
>
> <<"a,b,\"quotedstring\",c,d">>.
>
> ..I know there must be a regex that would avoid the stripping inside
> the quotes, but no amount of experiment (or google) has yielded a
> suitable result for me.
>
> Can anybody immediately see a solution (and put me out of my pain)?
>
> Thanks in advance,
> /s
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
--
Jayson Vantuyl
kagato@REDACTED
More information about the erlang-questions
mailing list