[erlang-questions] Why is it necessary to "double-escape" [ characters in regular expressions?

Mon Mar 30 05:19:28 CEST 2009

Hello group,
Running 5.6.5 under Windows...

I've got a bunch of code that's "almost but not quite syntactically correct"
XML, and I'm trying to convert it to valid XML.  Part of this process
involves removing some invalid CDATA tags.

My code fragment:
  re:replace("abc123", "<!\[CDATA\[<", "<", [{return, list}]).
is giving me "exception error: bad argument in function re:replace/4.

Trial and error shows that removing the escaped [ characters:
  re:replace("abc123 <![CDATA[< abc123", "<!CDATA<", "<" [{return, list}]).
works as expected, but it's obviously not what I want.

However, "double-escaping" the [ characters (by adding a second \ prior to
the [ character) does exactly what I want:
  re:replace("abc123 <![CDATA[< abc123", "<!\\[CDATA\\[<", "<", [{return,
list}])
returns "abc123 < abc123", which is the result I'm after.

In this context, I guess it's conceivable that the [ character can be
misinterpreted in two distinct ways in a regular expression:
- it could denote the start of an Erlang list
- it could denote the start of a character grouping within a regular
expression
However, I didn't expect that "double escaping" it would be the solution to
my problem.

Is this expected behaviour, or some sort of anomaly?  In any case, sending
this email to the mailing list should help out the next person who falls
into this trap, but who can use Google to track down the solution...

Regards

David Mitchell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090330/4ae5d56f/attachment.htm>