[erlang-questions] some language changes
Robert Virding
robert.virding@REDACTED
Mon May 28 00:41:27 CEST 2007
Sorry for the delay in commenting but I have been busy with other stuff.
Theway I see it there are two problems being dicussed here:
1) another syntax for strings
2) another representation for regular expressions
OK, some thoughts:
1) The basic problem is that there is quoting to allow "strange"
characters to be entered.
While I don't think it would be really impossible to add something like
Joe suggests it might cause problems at the token level if any specifier
is allowed. It would be easy if you had a specifier to then call a
function to parse the string, it would just have to observe the same
protocol as the normal tokeniser.
An easier solution would be to introduce an alternate string form which
has absolutely NO quoting at all. We use another delimiter, ~. So
X = ~[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*~
would work fine. To include ~ you double it ~~.
Y = ~string with a ~~ in it~
That is the only extra rule, all other characters go in literally. You
could allow quoting it with \ but then you need to be able to quote \
with \\ and which solves nothing for regexps.
I would prefer to use backquote but I haven't worked out how to write it
with this bloody swedish keyboard. Slash would look good but would
probably be confused with the / operator.
How about // and only quoting / inside:
Z = //string with a \/ in it//
No, because then you would have to quote \ with \\ and we are back to
where we started. You need no quoting only doubling to include string
delimeter.
ZZ = //string with a // in it//
2) Using a more functional syntax to specify regexps would work well. I
assume that this form would return the internal parsed form and not a
string. My only question is how do we declare these functions to be
special regexp functions? I suppose you could write:
X = regex (seq(any(".?!"), span("\"')]}")
alt("\n", "\t", seq(" ",any(" \t\n"))),
span(" \t\n")))
and then have a parse_transform. Or the compiler could recognise the
regex/1 "call" and fix it. It might be difficult to add these functions
now specials.
The major problem with this solution is that people won't be able to
snip regexs directly out of Friedl's book but might actually be forced
to understand them. :-)
I have absolutely no problems with doing something about this, but we
need to decide which problem we are solving. How much are regular
expressions ACTUALLY used in Erlang code? How much effort is it worth
putting in to solve this problem?
Robert
ok wrote:
> On 21 May 2007, at 10:24 pm, Joe Armstrong wrote:
> No - it changes the *syntax* of a string - normally you have to quote
> backspashes
>
> Suppose I had a *simple* regexp like this:
>
> [.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*
>
> I'd like to say
>
> X = regexp "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
>
> and not the more obvious
>
> X = "[.?!][]\\"')}]*\\\\($\\\\| $\\\\|\t\\\\| \\\\)[ \\t\\n]*"
>
> Let's look at this in two different ways.
> First, let's break the regexp up:
>
> [.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*
> AAAAABBBBBBBBBCC(IJJ| KLL|MMOO|RSTT)UUUUUUUU
>
> Note that regular expression syntax has weird quoting of its own.
> (One reason I want to write regexps in Erlang AS Erlang!)
> It looks as though [].... starts with an empty set, but in fact the
> right bracket is an element of the set. It looks as though there
> are lots of backslashes, but (CC,JJ,LL,OO,TT), but on close
> inspection (how HARD this is to read!) this appears to presuppose
> a regular expression syntax in which the special meaning of (|) has
> to be turned ON with backslashes, instead of the usual syntax where
> backslashes turn the special meaning OFF. (Just more reason NOT to
> want this construction. Which of the many regular expression
> syntaxes do we actually get?) Not only that, instead of \( \| \)
> we find doubled backslashes! I don't know any regexp syntax that
> requires \\(...\\|...\\) for an alternation, and if I did, I would
> not want to use it.
>
> So here's how I would like to write that:
>
> X = seq(any(".?!"), span("\"')]}"),
> alt("\n", "\t", seq(" ",any(" \t\n"))),
> span(" \t\n"))
>
> This is BETTER than fancy regexp syntax, because it's just normal
> Erlang syntax that can include *any* computations we find useful.
> For example,
>
> Stops = ".?!",
> Closers = "\"')]}",
> BigSpace = "\t\n",
> Space = " "++BigSpace,
> X = seq(any(Stops),span(Closers),
> alt(any(BigSpace),seq(" ",any(Space)), span(Space))
>
> Second, suppose for some reason we don't like function calls, and we
> do like regular expression syntax, with all the backslashes *that*
> requires. Let me introduce you to the idea of a preprocessor.
>
> Our input syntax will be
>
> /<stuff>/
>
> on one line by itself, possibly followed by a comma
> or semicolon, possibly followed by a comment.
> We want to replace this by
>
> regexp:compile("<stuff'>")
>
> where <stuff'> is <stuff> with appropriate backslashes added,
> putting the comma or semicolon back if there was one.
> What do we need to quote?
> - double quotes
> - backslashes
>
> Here we go.
>
> #!/bin/awk -f
>
> $0 ~ /^[ \t]*\/.*\/[,;]?[ \t]*(%.*)?$/ {
> x = $0
> match(x, /^[ \t]*\//)
> head = substr(x, 1, RLENGTH-1)
> x = substr(x, RLENGTH+1)
> sub(/[ \t]*(%.*)?$/, "", x)
> if (x ~ /[,;]$/) {
> tail = substr(x, length(x), 1)
> x = substr(x, 1, length(x) - 1)
> } else {
> tail = ""
> }
> sub(/\/$/, "", x)
> gsub(/["\\]/, "\\\\&", x)
> print head "regexp:compile(\"" x "\")" tail
> next
> }
> {
> print
> }
>
> I have tested this on some small examples and it seems to work. It took
> some doing, precisely because regular expression syntax is so hard to
> work with, compared with normal Erlang syntax.
>
> This preprocessor is just 16 SLOC of AWK. For *THIS* we are to
> make Erlang lexical structure more complicated and to break editor
> support for the language?
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
More information about the erlang-questions
mailing list