[erlang-questions] some language changes
ok
ok@REDACTED
Fri Jun 1 05:09:40 CEST 2007
I mentioned the Eiffel
"{
<verbatim stuff>
}"
syntax.
On 1 Jun 2007, at 9:57 am, Robert Virding wrote:
> I would have no trouble accepting either just as long as you have
> NO QUOTING at all. Not PHP '' strings where you need to quote both
> \ and '. The Eiffel way adds extra lines and can break up an
> expression.
If you want to include chunks of one language inside another, you
have to have SOME kind of quoting. Take shell 'here' documents as
an example:
<<'EOF'
...
...
EOF
Every line is taken literally, up to but excluding the EOF line (a line
that is identical to the word following <<). Use <<-'EOF' instead, and
leading tabs will be stripped from the data lines, so you can indent
the document nicely. (This is rather like the "{ -vs- "[ distinction
in Eiffel.) Remove the quotes from 'EOF' (so <<EOF or <<-EOF) and
command and parameter substitution and \ processing are done.
If you want some lines with leading tabs in the data, you have to use
<<'EOF' (or <<EOF) and give up on indentation. If you want a line in
the data that exactly matches EOF, you are out of luck; you will have
to choose some other end of file magic word. But *some* magic word
there must be, or all of the rest of the containing file will be taken.
There seem to be only a few ways to get the effect of NO quoting at all.
1. The Eiffel/sh way: if the EOF string occurs in the data you want you
are out of luck. (In Eiffel's case, totally.)
2. Use a number instead of quoting, like Fortran's late unlamented
Hollerith literal. I don't like the idea of writing
44`[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*
and I don't suppose anyone else does either. (Not least because I
probably counted wrong.)
3. Use a word-processor-like interface where "string", "code", and
"comment" are styles (hence not indicated in the text stream at
all, but in a hidden markup stream). Of course, this has severe
trouble when you try to use a source file in language X containing
a no-quotes string in language Y which contains a no-quotes string
in language Z.
I don't see "break[ing] up an expression" as a problem at all.
Remember,
one change I *do* like very much is adding
<variable> = <constant expression>.
as a kind of top level definition. So 'no-quotes' strings just plain
should never BE in expressions in the first place. You should have
End_Of_Sentence = "{
[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*
}".
or something of the source as a top level definition with a name.
This is also the reason why I don't see "add[ing] extra lines" as a
serious problem.
Bertrand Meyer is clever, but he's not the only clever person, and
it may well be possible to come up with something even better than
Eiffel's "{..}" and "[..]" long string syntax. I'm not in love with
it myself. But it IS prior art which DOESN'T involve any kind of
quoting
in the body of the string.
> About the regular expression syntax, if I understand you correctly
> then what you basically want to do is separate specifying/parsing
> the regexp and applying it.
No. That's what we have in most languages: re_compile (specify/parse)
+ re_match (apply). What I want is FOUR things:
1. Source form. There will eventually be MANY of these, each
imitating closely one of the many many different regular
expression syntaxes out there (Vim, Emacs, Java, Perl, Tcl,
AWK, ...). I would prefer that NONE of these should have
special syntactic support, because I don't see any reason
for any one of them to be so privileged. (POSIX syntax would
be the obvious candidate for this privilege, except that there
are two POSIX syntaxes.) The way these syntaxes should be
supported is by library packages. (In Ada they would be
child packages of a regexp package; with Erlang's Java-wannabe
flat-name-space-with-dots-in-it scheme there would be no point.)
2. Abstract syntax trees. Just exactly what these _are_ should be
the private concern of some module, but there should be a single
set of functions one can use to construct abstract syntax trees.
This is important because it lets one construct regular
expressions with NO double or triple quoting, NO worries about
exactly which syntax one is using, and with the marvellous
power of functional abstraction available for constructing them.
Just recently I marked some student code where half the students
had it easy and half had it very hard. The half who found it
easy were working with data structures that represented the
abstract syntax of their data: parsing stuff that came from
a file, unparsing stuff sent to a file, but otherwise working
on a nice clean exceptionless data structure. The half who
found it hard were working with strings that held the external
form of the data. Almost every operation was nastily complex
because of this. Working with a textual representation of
regular expressions is a NIGHTMARE.
3. A compiled form ready for some kind of execution. There might
be more than one of these. One might be incremental and one
not, for example.
4. Matching.
Given the variation in surface syntax for regular expressions, and
the variation on the ways one might want to compile them (DFAs, NDFAs,
backtracking parsers, ..., or to a data structure, to native code, or
to Erlang source code in the case of something like Leex), what I am
asking for really is the obvious interface: the bit in the *middle*
that is common to all of them.
> I am almost ready with a new regexp module which will never
> explode, is hopefully reasonably fast, works directly on binaries
> as well as strings and can handle subexpressions. This version
> support POSIX regexps and an interface based on AWK. All that is
> left is to work out details of the interface and return values. It
> is internally based on NFAs.
This is great news.
More information about the erlang-questions
mailing list