[erlang-questions] Strings as Lists

Thu Feb 21 00:07:30 CET 2008

On 19 Feb 2008, at 7:00 pm, Matt Kangas wrote:
>> Principle 4:
>> 	LET A HUNDRED SCHOOLS CONTEND.
>
>
> Erm... are you the one now expressing PERL envy? :-) One of Perl's
> mottos, after all, is: "There's more than one way to do it."

I am *not* suggesting that there ought to be many ways to do it in the
language.  What I am suggesting is pretty much the opposite.  The idea
in Perl is to avoid design work.  I'm saying let's try lots of designs
BEFORE adding anything to the language, and then let's add at most ONE
thing.  After all, I said to let a hundred schools CONTEND, not to let
a hundred schools PREVAIL.

> I think there is value in providing *one obvious solution* to a
> problem.

"For every problem, there is an answer which is simple, obvious, and  
WRONG."
Joe probably thinks his sketch is close to being "one obvious  
solution" to
the needs he expressed.  But I think it is horribly ugly, error  
prone, and
more of a pain in the rectal region to deal with than the problem it is
supposed to solve.

> If it solves > 90% of common use-cases,

Then it isn't a solution.  Let's face it, what we have NOW solves > 90%
of common use-cases.

> is syntactically simple,

the solution Joe proposed is NOT syntactically simple.  It puts a  
great deal
of syntactic (and some semantic) processing in the lexical analyser,  
which is
not a really wonderful place to put it.  For the problem he is trying  
to deal
with, this may not be avoidable:  trying to embed several levels of  
lexical
structure that were never designed to fit together is NOT going to be  
easy.
>
> I'm fascinated by the flexibility you propose, but confused about the
> implications. Should we need to support a Tower of Hanoi for
> notations? How likely are users to ever embed > 1 notation? > 2?

I see no tower of Hanoi here.  In fact it is precisely the point of my
design that to support an additional notation
	- the "meta-notation" (function header, lines, %% and @@ insertions,
	  and dot) are all handled by the *framework*, which remains completely
	  ignorant of any specific notation
	- you add ONE function that takes a string and adds the quotation
	  needed for your particular notation.
Instead of towers, there are at most bucket brigades.

Here's another approach.
My emacs-like text editor "thief" has a command ESC [ ` which means
"convert region to HTML by changing the characters <>"'& to entity  
references."
(All the HTML commands are on ESC [.)  So I can (and do) write  
whatever I
want, such as embedded programming language text, just the way it looks,
and then convert it.

I also have a library package with quoting and unquoting code for
	AWK
	C and C++ (without trigraphs)
	C and C++ (with trigraphs)
	Csh
	DEC-10 Prolog
	Fortran 77 and 90 (but only printing characters)
	Java
	Lisp
	M4
	Quintus Prolog
	sh
	TeX
I happen not to have needed Eiffel, Erlang, or Haskell in this  
library yet,
but it is really quite a small matter of programming to do that.  It  
would
also be a small matter of programming to plug these into my editor.

So if I wanted a fragment of TeX in the query of a URL, I would then  
be able to
1. Type the text the way I would normally type it.
2. Select the TeX part and ESC ` u 		(quote region as URL)
3. Select the whole URL and ESC ` e		(quote region as Erlang)

Please remember, the framework for this DOES exist, but quote-as-URL and
quote-as-Erlang currently do NOT.  What I am demonstrating here is a  
DESIGN.

This design completely solves the problem of WRITING embedded notations
without ANY language change whatever.  (As does my previous proposal;  
that
was for a fairly language-independent preprocessor, NOT for something to
go in the Erlang compiler.)

It doesn't really solve the problem of writing embedded notations  
READABLY,
which my previous proposal did (and which Joe's proposal failed to).

The simplest most obvious design that could solve Joe's problem in a  
readable
way has four levels:
	(1) lexical: some kind of 'literal' string
	(2) syntactic: no change to the existing language whatever
	(3) library: a suite of functions that take a string and add whatever
	    quotation is needed for a specified notation to treat all the
	    characters literally (rather like my C library mentioned above).
	(4) optimisation: the compiler is allowed, but not required, to
	    evaluate calls to certain functions with known arguments at
	    compile time; the quoting functions may but need not be in the
	    set of such functions.

The difficult thing is (1), which I think Joe wants anyway.  I'm  
aware of
several lexical devices for this, and they all stink in one way or  
another,
because there isn't ANY delimiter character that you might not want  
to include
in the data; it is even conceivable that the data might include at  
least one
instance of every character.  The only lexical design that doesn't  
have that
problem is the old Fortran 66
	<count>H<characters>
notation, which is easy enough to generate with a text editor.   
Perhaps if we
say that a literal string begins with n+2 quotation marks and a  
single character
that is not a letter, digit, space, or tab, and then ends with  
another copy of
that single character followed by n+2 quotation marks.  (In Erlang,  
the quotation
marks could be " for a string or ' for an atom.)  For any literal  
string, there
is some longest block of quotation marks, so it is always possible to  
select a
bracketing run that is longer.  Note that the single character that  
ends the
run of quotation marks could be a new line, so we could have
	Literal_String = """
	Here is `'"\$^some literal text with an embedded
	but no trailing newline
	""",
	Another = ""!He said "Foo!" But that was not the end!!""

Hm.  I think I may finally have something simple, obvious, readable, and
it just might work.