[erlang-questions] Adoption of perl/javascript-style regexp syntax

Ulf Wiger ulf.wiger@REDACTED
Tue Jun 2 15:02:35 CEST 2009


Geoffrey Biggs wrote:
> Python provides a method of specifying strings they call "raw strings," 
> which I find quite interesting. Basically, you prefix your string with r 
> or R, and any backslashes are treated as literal characters rather than 
> escape sequences. For example:
> 
>  >>> '\b'
> '\x08'
>  >>> r'\b'
> '\\b'

The problem is that this uses regular tokens and has a valid
parse scan result today:

4> erl_scan:string("r'\b'.").
{ok,[{atom,1,r},{atom,1,'\b'},{dot,1}],1}

To support it, one would have to make r' a token in its own
right, which *might* actually break existing code (albeit
unlikely) - or complicate the scanner by having it look ahead
in a form of quick parse in order to figure out whether this
is a string or not.

That was one reason why I went for the backtick. It's not
recognized by the parser today.

Another problem, of course, is that while the r'...' syntax
lets you write \ without escaping, it still has some issues
with escaping, which I find a bit unintuitive.

By contrast, the `P...P is pretty simple to understand (you
just have to pick a delimiter that doesn't show up in the
string - it could be `'foo', `&foo&, or whatever. The way I
wrote it, you couldn't pick \ or \n as the delimiter, although
\ would actually work, I guess... (a newline would work too, but
that I find unintuitive.)

BR,
Ulf W
-- 
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com


More information about the erlang-questions mailing list