[erlang-questions] Why is it necessary to "double-escape" [ characters in regular expressions?

Johnny Billquist bqt@REDACTED
Wed Apr 1 15:34:27 CEST 2009


I'm not sure I would call it "escaping", since [] in a regular 
expression actually have a meaning. They express a range of valid chars. 
However, the characters inside [] are interpreted/parsed in another way 
than outside of them, which cause a [ inside to be accepted literally. ] 
is a little ugly in that it must be the first character in the range 
specified inside a [], otherwise it won't work. (So you could say [abc[] 
to match any of a,b,c or [, but you couldn't say [abc]], you would have 
to write it as []abc]).

Using \ to excape brackets seems to vary between different 
implementations of regexps that I look at.

As for the orginial question, others have already pointed it out, but in 
order to get a \ in the actual string you create, you need to put a 
double \ in the literal. And that's escaping. :-)

	Johnny

Richard Andrews wrote:
> IIRC the way to escape [ in regular expressions is [[] not \[.
> Similarly []] not \].
> 
> Never tried with erlang re application though.
> 
> ------------------------------------------------------------------------
> *From:* David Mitchell <monch1962@REDACTED>
> *To:* erlang-questions Questions <erlang-questions@REDACTED>
> *Sent:* Monday, 30 March, 2009 2:19:28 PM
> *Subject:* [erlang-questions] Why is it necessary to "double-escape" [ 
> characters in regular expressions?
> 
> Hello group,
> 
> Running 5.6.5 under Windows...
> 
> I've got a bunch of code that's "almost but not quite syntactically 
> correct" XML, and I'm trying to convert it to valid XML.  Part of this 
> process involves removing some invalid CDATA tags.
> 
> My code fragment:
>   re:replace("abc123", "<!\[CDATA\[<", "<", [{return, list}]).
> is giving me "exception error: bad argument in function re:replace/4.
> 
> Trial and error shows that removing the escaped [ characters:
>   re:replace("abc123 <![CDATA[< abc123", "<!CDATA<", "<" [{return, list}]).
> works as expected, but it's obviously not what I want.
> 
> However, "double-escaping" the [ characters (by adding a second \ prior 
> to the [ character) does exactly what I want:
>   re:replace("abc123 <![CDATA[< abc123", "<!\\[CDATA\\[<", "<", 
> [{return, list}])
> returns "abc123 < abc123", which is the result I'm after.
> 
> In this context, I guess it's conceivable that the [ character can be 
> misinterpreted in two distinct ways in a regular expression:
> - it could denote the start of an Erlang list
> - it could denote the start of a character grouping within a regular 
> expression
> However, I didn't expect that "double escaping" it would be the solution 
> to my problem.
> 
> Is this expected behaviour, or some sort of anomaly?  In any case, 
> sending this email to the mailing list should help out the next person 
> who falls into this trap, but who can use Google to track down the 
> solution...
> 
> Regards
> 
> David Mitchell
> 
> ------------------------------------------------------------------------
> Enjoy a better web experience. Upgrade to the new Internet Explorer 8 
> optimised for Yahoo!7. Get it now. 
> <http://au.rd.yahoo.com/search/ie8/mailtagline/*http://us.lrd.yahoo.com/_ylc=X3oDMTJxbnQwdTJhBF9zAzIxNDIwMjU2NTkEdG1fZG1lY2gDVGV4dCBMaW5rBHRtX2xuawNVMTEwMzQ0OAR0bV9uZXQDWWFob28hBHRtX3BvcwN0YWdsaW5lBHRtX3BwdHkDYXVueg--/SIG=11k6t9t1c/**http://downloads.yahoo.com/au/internetexplorer/>.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions


-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt@REDACTED             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol



More information about the erlang-questions mailing list