[erlang-questions] Strings as Lists

Tue Feb 19 04:55:33 CET 2008

On 19 Feb 2008, at 3:49 am, Joe Armstrong wrote:
[expressing PERL envy, basically].
>

> To me, embedding regexps, LaTeX etc. in strings is painful and I make
> loads of mistakes forgetting to quote things.

Patient:	Doctor, it hurts when I do <this>.
Doctor:		Then don't do that.

We've had essentially this discussion before.
I'm reminded of the classic design botch in SGML.
XML has		<![CDATA[...]]>
in which the only special characters are the "]]>" ending quote,
the others being taken literally.  Too bad if you want to nest them!
SGML also has	<!ELEMENT tag - - CDATA>
which lets you use <tag>...</tag> to quote the characters ... *and*
to put a wrapper around them.  It would be perfect for quoting bits
of SGML in tutorials except that (here comes the botch): it is closed
by *any* end-tag, not just the one it started with.

There really is no programming language which handles "textual things  
inside
textual things" terribly well.  The quoting and meta-quoting stuff  
ends up
being pretty nasty no matter what you do.  The specific proposal Joe  
made
just now
	- would make life horrible for editors like Emacs
	- would make things very confusing for people
	- would STILL be hard to use.

So, in the light of the old joke, let's not do that.

Principle 1:
	NO NESTING.
	I love nested blocks, and have since Algol 60.
	I love nested expressions, and have since Lisp.
	But when you want to combine multiple notations, as for example
	XSLT (hiss, spit) does, nesting is for the birds.
	I would have called this ONE HEADACHE AT A TIME, but cannot see
	how to get by with fewer than two.

Principle 2:
	NAME AND CONQUER.
	If it's big enough to be a problem, it's big enough to have a name.

Principle 3:
	WHAT I SEE ISN'T WHAT HE GETS.
	In order to be used in Erlang programs, a notation has to be accepted
	by the Erlang tool chain, but it does not have to be part of the Erlang
	language or understood by the Erlang compiler.  We have already
	accepted this idea for Yecc and Leex.  Keeping it out of the compiler
	is also suggested by the next principle:

Principle 4:
	LET A HUNDRED SCHOOLS CONTEND.
	It's most unlikely that we'll come up with the right design, or even
	the right *kind* of design, on the first pop.  Maybe the right way to
	do it is to write all our code in Microsoft Word (hiss, spit, screech,
	jump, claw!) using styles to distinguish one reading of the text from
	another.  Maybe we should be using an SGML-based or XML-based markup
	language (not entirely unlike the one I proposed some years ago,
	perhaps) with something like Amaya as our editor.  Maybe we should be
	asking the aliens from Zeta Reticuli to do our programming for us.

Principle 5:
	RUN IT UP THE FLAGPOLE AND SEE IF ANYONE FAINTS.

Here's a sketch of something that can handle large chunks of text in  
a mixture
of notations.  The key ideas are
	- there are Notations (hmm, haven't I heard that before, oh yes, it
	  was SGML...).  A Notation provides a rule for quoting interpolated
	  text, and may also be associated with a syntax checker.
	- there is interpolation, of two kinds.  In the case of Literal
	  interpolation, a string in any notation is interpolated as literal
	  data according to the Notation's rule.  In the case of Interpreted
	  interpolation, a text in one notation may be interpolated in text
	  of the same notation only, the result being subject to the syntax
	  check of the notation, if any.
	- @id@	indicates literal interpolation
	  @@	is a plain @
	  %id%	indicates interpreted interpolation
	  %%	is a plain %
	  %\n	is removed; it's continuation.
	  These characters were chosen to be minimally obtrusive in LaTeX
	  and regular expressions and Erlang text.
	- Text chunks have names, which can be used in Erlang code.
	- Text chunks may have arguments.

<text definition> ::=
	<function name> '(' [<arguments] ')' ['/' <notation name>] 'is' '\n'
		<data line>*
		'.' '\n'
	<data line> ::=
		<one white space character> <data item>* ['%'] '\n'

	<data item> ::=
		'%' <expr> '%'
	    |   '%' '%'
	    |   '@' <expr> '@'
	    |   '@' '@'
	    |   [^%@\n]

	<expr> ::=
		<variable>
	    |   <function name> '(' [<expr> {',' <expr>}] ')

The set of notations we'd need has yet to be determined,
but it would certainly include
	latex
	regexp
	xml
	string		(" and \ are special)
	atom		(' and \ are special)
	url		(\ is illegal)

Example:
	time()/regexp is
		^1?[0-9]:[0-5][0-9] [AP]M$
	.
	explanation()/latex is
		The regular expression \verb|@time()@| matches
		any string of the form
		\textit{h}:\textit{m}\verb*| |\textit{ampm}
		where \textit{h} is one or two decimal digits,
		representing an hour 1--12, \textit{m} is two
		decimal digits, with a leading zero if necessary,
		representing a minute 00--59, and \textit{ampm}
		is either AM (\textit{ante meridiem}) or
		PM (\textit{post meridiem}).\foot{For the pedants
		amongst you, note that it is meridiEM, not
		meridiAN}
	.
	base()/url is
		http://erlang.example.org/%
	.
	relative()/url is
		erlang/doc/preproc/hundred.html%
	.
	complete()/url is
		%base()%%relative%%
	.
	omnium_gatherum() is
		{@time()@,@REDACTED()@,@REDACTED()@}%
	.

A preprocessor would turn this into plain Erlang.

There isn't actually much, if anything, in this that is specific to  
Erlang.