# [erlang-questions] Strings as Lists

Richard A. O'Keefe <>
Tue Feb 19 04:55:33 CET 2008

On 19 Feb 2008, at 3:49 am, Joe Armstrong wrote:
[expressing PERL envy, basically].
>

> To me, embedding regexps, LaTeX etc. in strings is painful and I make
> loads of mistakes forgetting to quote things.

Patient:	Doctor, it hurts when I do <this>.
Doctor:		Then don't do that.

We've had essentially this discussion before.
I'm reminded of the classic design botch in SGML.
XML has		<![CDATA[...]]>
in which the only special characters are the "]]>" ending quote,
the others being taken literally.  Too bad if you want to nest them!
SGML also has	<!ELEMENT tag - - CDATA>
which lets you use <tag>...</tag> to quote the characters ... *and*
to put a wrapper around them.  It would be perfect for quoting bits
of SGML in tutorials except that (here comes the botch): it is closed
by *any* end-tag, not just the one it started with.

There really is no programming language which handles "textual things
inside
textual things" terribly well.  The quoting and meta-quoting stuff
ends up
being pretty nasty no matter what you do.  The specific proposal Joe
just now
- would make life horrible for editors like Emacs
- would make things very confusing for people
- would STILL be hard to use.

So, in the light of the old joke, let's not do that.

Principle 1:
NO NESTING.
I love nested blocks, and have since Algol 60.
I love nested expressions, and have since Lisp.
But when you want to combine multiple notations, as for example
XSLT (hiss, spit) does, nesting is for the birds.
I would have called this ONE HEADACHE AT A TIME, but cannot see
how to get by with fewer than two.

Principle 2:
NAME AND CONQUER.
If it's big enough to be a problem, it's big enough to have a name.

Principle 3:
WHAT I SEE ISN'T WHAT HE GETS.
In order to be used in Erlang programs, a notation has to be accepted
by the Erlang tool chain, but it does not have to be part of the Erlang
language or understood by the Erlang compiler.  We have already
accepted this idea for Yecc and Leex.  Keeping it out of the compiler
is also suggested by the next principle:

Principle 4:
LET A HUNDRED SCHOOLS CONTEND.
It's most unlikely that we'll come up with the right design, or even
the right *kind* of design, on the first pop.  Maybe the right way to
do it is to write all our code in Microsoft Word (hiss, spit, screech,
jump, claw!) using styles to distinguish one reading of the text from
another.  Maybe we should be using an SGML-based or XML-based markup
language (not entirely unlike the one I proposed some years ago,
perhaps) with something like Amaya as our editor.  Maybe we should be
asking the aliens from Zeta Reticuli to do our programming for us.

Principle 5:
RUN IT UP THE FLAGPOLE AND SEE IF ANYONE FAINTS.

Here's a sketch of something that can handle large chunks of text in
a mixture
of notations.  The key ideas are
- there are Notations (hmm, haven't I heard that before, oh yes, it
was SGML...).  A Notation provides a rule for quoting interpolated
text, and may also be associated with a syntax checker.
- there is interpolation, of two kinds.  In the case of Literal
interpolation, a string in any notation is interpolated as literal
data according to the Notation's rule.  In the case of Interpreted
interpolation, a text in one notation may be interpolated in text
of the same notation only, the result being subject to the syntax
check of the notation, if any.
- @id@	indicates literal interpolation
@@	is a plain @
%id%	indicates interpreted interpolation
%%	is a plain %
%\n	is removed; it's continuation.
These characters were chosen to be minimally obtrusive in LaTeX
and regular expressions and Erlang text.
- Text chunks have names, which can be used in Erlang code.
- Text chunks may have arguments.

<text definition> ::=
<function name> '(' [<arguments] ')' ['/' <notation name>] 'is' '\n'
<data line>*
'.' '\n'
<data line> ::=
<one white space character> <data item>* ['%'] '\n'

<data item> ::=
'%' <expr> '%'
|   '%' '%'
|   '@' <expr> '@'
|   '@' '@'
|   [^%@\n]

<expr> ::=
<variable>
|   <function name> '(' [<expr> {',' <expr>}] ')

The set of notations we'd need has yet to be determined,
but it would certainly include
latex
regexp
xml
string		(" and \ are special)
atom		(' and \ are special)
url		(\ is illegal)

Example:
time()/regexp is
^1?[0-9]:[0-5][0-9] [AP]M\$
.
explanation()/latex is
The regular expression \verb|@time()@| matches
any string of the form
\textit{h}:\textit{m}\verb*| |\textit{ampm}
where \textit{h} is one or two decimal digits,
representing an hour 1--12, \textit{m} is two
decimal digits, with a leading zero if necessary,
representing a minute 00--59, and \textit{ampm}
is either AM (\textit{ante meridiem}) or
PM (\textit{post meridiem}).\foot{For the pedants
amongst you, note that it is meridiEM, not
meridiAN}
.
base()/url is
http://erlang.example.org/%
.
relative()/url is
erlang/doc/preproc/hundred.html%
.
complete()/url is
%base()%%relative%%
.
omnium_gatherum() is
{@time()@()@()@}%
.

A preprocessor would turn this into plain Erlang.

There isn't actually much, if anything, in this that is specific to
Erlang.