[erlang-questions] Why is it necessary to "double-escape" [ characters in regular expressions?
Johnny Billquist
bqt@REDACTED
Wed Apr 1 15:34:27 CEST 2009
I'm not sure I would call it "escaping", since [] in a regular
expression actually have a meaning. They express a range of valid chars.
However, the characters inside [] are interpreted/parsed in another way
than outside of them, which cause a [ inside to be accepted literally. ]
is a little ugly in that it must be the first character in the range
specified inside a [], otherwise it won't work. (So you could say [abc[]
to match any of a,b,c or [, but you couldn't say [abc]], you would have
to write it as []abc]).
Using \ to excape brackets seems to vary between different
implementations of regexps that I look at.
As for the orginial question, others have already pointed it out, but in
order to get a \ in the actual string you create, you need to put a
double \ in the literal. And that's escaping. :-)
Johnny
Richard Andrews wrote:
> IIRC the way to escape [ in regular expressions is [[] not \[.
> Similarly []] not \].
>
> Never tried with erlang re application though.
>
> ------------------------------------------------------------------------
> *From:* David Mitchell <monch1962@REDACTED>
> *To:* erlang-questions Questions <erlang-questions@REDACTED>
> *Sent:* Monday, 30 March, 2009 2:19:28 PM
> *Subject:* [erlang-questions] Why is it necessary to "double-escape" [
> characters in regular expressions?
>
> Hello group,
>
> Running 5.6.5 under Windows...
>
> I've got a bunch of code that's "almost but not quite syntactically
> correct" XML, and I'm trying to convert it to valid XML. Part of this
> process involves removing some invalid CDATA tags.
>
> My code fragment:
> re:replace("abc123", "<!\[CDATA\[<", "<", [{return, list}]).
> is giving me "exception error: bad argument in function re:replace/4.
>
> Trial and error shows that removing the escaped [ characters:
> re:replace("abc123 <![CDATA[< abc123", "<!CDATA<", "<" [{return, list}]).
> works as expected, but it's obviously not what I want.
>
> However, "double-escaping" the [ characters (by adding a second \ prior
> to the [ character) does exactly what I want:
> re:replace("abc123 <![CDATA[< abc123", "<!\\[CDATA\\[<", "<",
> [{return, list}])
> returns "abc123 < abc123", which is the result I'm after.
>
> In this context, I guess it's conceivable that the [ character can be
> misinterpreted in two distinct ways in a regular expression:
> - it could denote the start of an Erlang list
> - it could denote the start of a character grouping within a regular
> expression
> However, I didn't expect that "double escaping" it would be the solution
> to my problem.
>
> Is this expected behaviour, or some sort of anomaly? In any case,
> sending this email to the mailing list should help out the next person
> who falls into this trap, but who can use Google to track down the
> solution...
>
> Regards
>
> David Mitchell
>
> ------------------------------------------------------------------------
> Enjoy a better web experience. Upgrade to the new Internet Explorer 8
> optimised for Yahoo!7. Get it now.
> <http://au.rd.yahoo.com/search/ie8/mailtagline/*http://us.lrd.yahoo.com/_ylc=X3oDMTJxbnQwdTJhBF9zAzIxNDIwMjU2NTkEdG1fZG1lY2gDVGV4dCBMaW5rBHRtX2xuawNVMTEwMzQ0OAR0bV9uZXQDWWFob28hBHRtX3BvcwN0YWdsaW5lBHRtX3BwdHkDYXVueg--/SIG=11k6t9t1c/**http://downloads.yahoo.com/au/internetexplorer/>.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt@REDACTED || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
More information about the erlang-questions
mailing list