[erlang-questions] Adoption of perl/javascript-style regexp syntax

Bengt Kleberg bengt.kleberg@REDACTED
Wed Jun 3 12:48:07 CEST 2009


Greetings,

I think that the reason not to use the widespread, standard and compact
way to represent regular expression, is that it is really difficult to
keep track of the escapes (\) and to compose them (sub expressions).


bengt

On Wed, 2009-06-03 at 11:35 +0200, Vlad Dumitrescu wrote:
> Hi,
> 
> On Tue, Jun 2, 2009 at 23:54, Richard O'Keefe <ok@REDACTED> wrote:
> > On 2 Jun 2009, at 7:08 pm, Vlad Dumitrescu wrote:
> >> As a programmer I like this way of handling this kind of issues
> >> because it works now and it's easy.
> >> As developer of a source handling tool I can't help but cringe at the
> >> prospect of getting requests to support all kinds of homegrown
> >> syntaxes...
> >
> > You mean like regular expression syntaxes?
> > I've lost count of the number of different variations of
> > regular expression syntax I've seen in UNIX.
> > The point of the wee tool I mentioned of course was to provide
> > *non*-syntax.
> 
> No, what i mean is syntaxes for allowing people to mark some strings
> as regular expressions so that a tool can process them and add
> backslashes or whatever. A source file containing such a marker would
> no longer be an Erlang source file, and it can't be handled by Erlang
> tools anymore.
> 
> >> Another problem with external processing of the source files is that
> >> it is at the same level as the preprocessor,
> > Well, no, it understands far less of Erlang syntax than the
> > Erlang preprocessor does, and operates way before it.
> 
> Even worse, then. I was being nice.
> 
> > But *any* program that computes source code by *any* means can
> > be called a "preprocessor".  I have a Smalltalk-to-C compiler.
> > You could call that a preprocessor if you like.  I don't think
> > the word itself helps our understanding very much.
> 
> It can be called that, but nobody did so and I'm not sure what that
> has to do with the current issue.
> 
> > We have Lisp-Flavoured Erlang.  If you want preprocessing that
> > can "intelligently" deal with Erlang source code, LFE is _it_.
> 
> LFE can intelligently preprocess LFE source code which is quite
> different than Erlang source code. How does it help me handle a
> vanilla Erlang module in erlide or emacs?
> 
> > There is of course a much better way to deal with regular
> > expressions in a language like Lisp or Erlang.  One of my pet
> > slogans is "STRINGS ARE WRONG".
> 
> I suppose that you mean something like "embedded strings in a language
> are wrong when representing anything else than plain text". And I
> couldn't agree more, they are evil - strings that represent for
> example a regexp should be a different data type than a text message
> string.
> 
> > The way to represent something
> > like "^[[:alpha:]_][[:alnum:]_]*:[[:space:]]" is
> >        rex:seq([rex:bol(),rex:id(),rex:space()])
> > where regular expression syntax is replaced by Erlang syntax.
> > This is so much more powerful than fancy quoting schemes for
> > strings that it just isn't funny: you can compute any subexpression
> > at any time you find useful _without_ new syntax, and without any
> > run-time parsing.
> 
> [I am sure you already know all of the following, Richard, but from
> your answer above you might have forgot it in the spur of the moment]
> 
> The same could be said about writing Erlang or C or Java parse trees
> directly instead of letting the parser build them for us from a
> string. Yet we don't do that because the textual representation has
> some advantages: it's easier to read, it is higher level, it's easier
> to modify and we're not bound to a specific internal representation.
> 
> The whole point with a parser is that the resulting AST is equivalent
> to the input string. If the textual representation has restrictions on
> what it can express, then it is so because the designer deemed it best
> so (or it's a bug, but we can ignore that here). Bypassing that and
> going directly to the parse tree might open a whole new can of worms.
> For embedded languages that are more complicated than regexps or xml,
> it might also be practically impossible to get it right manually.
> 
> Regexps are (as you say) a structured datatype. Nobody disagrees. But
> we have a widespread, standard and compact way to represent them. Why
> wouldn't we want to use that instead of Erlang terms? Given a compiler
> that understands this, the following examples will generate exactly
> the same code:
>     identifier() -> {seq,{cset,letters()},{star,{cset,continuers()}}}.
>     identifier() -> "{letters}{continuers}*".
> I know which one I find easier to read and understand.
> 
> Regarding your security concerns about cross-scripting, I don't think
> they are 100% relevant in this discussion. Those problems appear when
> one takes a string from the external world and "pastes" it mindlessly
> inside a program that is then executed. We are talking here about
> being able to let a string (the erlang source file) be tokenized and
> parsed by several scanners and parsers. There is no part in this
> string that is injected from the outside so that the programmer's
> intentions can be abused.
> 
> 
> All in all, regular expressions are just a particular case of embedded
> language. If there is to be any change to the Erlang syntax, I
> wouldn't want it tailored to a specific language. For example, I want
> to be able to embed Erlang code inside Erlang, which would allow
> macros like LFE has and other goodies.
> 
> best regards,
> Vlad
> 
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
> 



More information about the erlang-questions mailing list