[erlang-questions] Adoption of perl/javascript-style regexp syntax

Tue Jun 2 02:44:33 CEST 2009

Python provides a method of specifying strings they call "raw strings," 
which I find quite interesting. Basically, you prefix your string with r 
or R, and any backslashes are treated as literal characters rather than 
escape sequences. For example:

 >>> '\b'
'\x08'
 >>> r'\b'
'\\b'

More info in the docs:
http://docs.python.org/3.0/reference/lexical_analysis.html#string-and-bytes-literals

I'm not sure how well it would work in Erlang, but it's certainly useful 
in Python for avoiding the headache-inducing backslash acrobatics 
necessary when writing the occasional complex regular expression.

Geoff

Ulf Wiger wrote:
> Dmitrii Dimandt wrote:
>> I've just come across re and I like it :)
>>
>> The only issue I have with it is that I have to specify regexps as  
>> strings. This leads to ugly-as-hell constucts like these:
>>
>> {ok, Re} = re:compile("(?<!\\\\)#")
>>
>> It actually tries to find two backslashes there... Or just one? I  
>> don't know :) What if Erlang could allow this:
>>
>> Re = /(?<!\\)#/
>>
>> ?
>>
>> Benefits:
>> - Less error-prone
>> - Expressions written this way can be parsed and compiled by the  
>> compiler (boost in performance, syntax checked at compile-time)
> 
> 
> It's not going to boost performance, as this is just 
> a preprocessor issue. But having to escape the backslashes
> when working with regexps is a pain.
> 
> Perhaps a better syntax would be to imitate the 
> LaTex \verb command. It allows you to specify the 
> delimiter, and then consumes all chars until it finds
> that delimiter, e.g. \verb!gdl4$%\^\$£$!
> 
> Since this exact syntax doesn't work in Erlang, a
> slight adjustment is in order. The scanner recognizes
> backticks today, but the parser doesn't. So, if we 
> change the scanner to recognize ` as the Erlang version
> of \verb, we can write:
> 
> 
> 1> re:split("foo\nbar",`!\n!).
> [<<"foo">>,<<"bar">>]
> 
> where
> 
> 2> `!\n!.
> "\\n"
> 
> 
> Diff follows. It was a quick hack, so it needs improvement.
> 
> --- /home/uwiger/src/otp/otp_src_R13B/lib/stdlib/src/erl_scan.erl       2009-04-16 05:23:36.000000000 -0400
> +++ erl_scan.erl        2009-06-01 09:09:49.000000000 -0400
> @@ -559,4 +559,2 @@
>      tok2(Cs, St, Line, Col, Toks, "^", '^', 1);
> -scan1([$`|Cs], St, Line, Col, Toks) ->
> -    tok2(Cs, St, Line, Col, Toks, "`", '`', 1);
>  scan1([$~|Cs], St, Line, Col, Toks) ->
> @@ -565,2 +563,4 @@
>      tok2(Cs, St, Line, Col, Toks, "&", '&', 1);
> +scan1([$`|Cs], St, Line, Col, Toks) ->
> +    scan_verb(Cs, St, Line, Col, Toks, []);
>  %% End of optimization.
> @@ -580,2 +580,27 @@
>  
> +scan_verb([], _St, Line, Col, Toks, Acc) ->
> +    {more, {[],Col,Toks,Line,Acc,fun scan_verb/6}};
> +scan_verb([Delim|Cs0], St, Line, Col, Toks, Acc) when Delim =/= $\n,
> +                                                      Delim =/= $\\ ->
> +    {Str, Cs, Line1, Col1} =  scan_verb_chars(
> +                                Cs0, St, Line, Col, Toks, {Acc,Delim}),
> +    tok3(Cs, St, Line1, Col1, Toks, string, Str, Str, 0).
> +
> +scan_verb_chars([], _St, Line, Col, Toks, {Acc, Delim}) ->
> +    {more, {[], Col, Toks, Line, {Acc,Delim}, fun scan_verb_chars/6}};
> +scan_verb_chars([Delim|Cs], _St, Line, Col, Toks, {Acc, Delim}) ->
> +    {lists:reverse(Acc), Cs, Line, Col};
> +scan_verb_chars([C|Cs], St, Line, Col, Toks, {Acc, Delim}) when C =/= Delim->
> +    {Line1,Col1} = case C of
> +                       $\n ->
> +                           {Line+1, Col};
> +                       _ ->
> +                           {Line, inc_col(Col,1)}
> +                   end,
> +    scan_verb_chars(Cs, St, Line1, Col1, Toks, {[C|Acc], Delim}).
> +
> +inc_col(no_col,_) -> no_col;
> +inc_col(C, N) when is_integer(C) -> C+N.
> +    
> +
>  scan_atom(Cs0, St, Line, Col, Toks, Ncs0) ->
> 
>