[erlang-questions] Re: Adoption of perl/javascript-style regexp syntax

Wed Jun 3 10:56:53 CEST 2009

Bengt Kleberg wrote:
> Greetings,
>
> Are there any reasons to have strings with many escapes, apart from when
> doing regular expressions?
>   
Yes, any time you have a number of backslashes or quotation 
marks in the original test, you will need to insert escapes
(or, as ROK points out, write a program that takes care of 
it for you.)

This commonly occurs in e.g. LaTex, HTML, XML, shell commands,
JavaScript, etc. - in just about every text format that is meant 
to be processed by another program.

Every time this becomes the main task of your program, I agree
that it makes sense in general to raise the abstraction level
and avoid messing about with "structured text". A very good 
example of this is of course generation of Erlang code, where
it is *much* better to generate abstract forms, and if necessary,
produce source code by pretty-printing the forms.

But there are lots of occurrences where this doesn't apply as well.
As I have stated (at least four times already in this thread), I'm
not a fan of inventing a new syntax for a specific problem, either 
by hacking the scanner or adding a preprocessor - *especially* 
when working in a large project, where most of the work on your 
code will be done by others than yourself. And even if it wouldn't 
be frowned upon, it is an investment in time and effort that may 
well be worse than battling with the string syntax in the few places 
where it's warranted.

Also, as Mats alluded to, the re library requires strings. One may
argue about the virtue of this, but the fact remains that for many 
string parsing tasks, re is by far the most efficient tool available
to Erlang programmers.

Every new syntax addition should of course be evaluated based on the 
expected benefits vs the slippery-slope problem of constantly adding
and never removing stuff. This is a valid argument. Support for
raw strings may not be important enough to warrant a syntax addition.

Telling people to work around the problem can be helpful, but often
isn't. As a general rule, I don't think that programming languages
should go out of their way to make things difficult because one 
would like programmers to tackle the problem differently. In some 
cases, there will be a tradeoff - e.g. immutability, where 
disallowing destructive updates has some distinct drawbacks, but
offer great benefits in return. I don't really see the great 
benefit in making life hard on those who want to use regexps...

Most attacks on the problem* will suffer some drawbacks. This is also
true for the "suck it up" approach, obviously. The r"..." approach
suffers from abusing a regular atom, but also from some fairly unclear
escaping rules (you still have to escape ", using \", which means that
you can't end the string with a \"). The `D...D approach suffers from
forcing you to choose a delimiter that doesn't appear in the string -
which may vary from time to time, making it a bit less intuitive
while being quite generic.

* The problem here being how to *conveniently* enter strings 
without having to struggle with annotating it with escapes,
whether it be regexps or anything else.

Having said all this, I'm fairly neutral about whether Erlang 
adds support for raw strings. It has not been a great pain for
me personally. I'm just a bit picky about having my arguments 
misrepresented.

BR,
Ulf W