[erlang-questions] Strings as Lists

Mon Feb 25 02:34:40 CET 2008

On 23 Feb 2008, at 9:22 am, Matt Kangas wrote:

> 1) Define a framework, have the framework know (and hide) the  
> appropriate terminating char-sequence
> 2) Let the user define a terminating char sequence. (Perl/PHP/ 
> Ruby's answer)
> 3) Have one terminating multi-line char sequence. (Python's answer)
> 4) Have one char-sequence, but permit its length to vary to allow  
> nesting, within reason. (Lua's answer)
>
> Richard, from your last post:
>
>> 	Literal_String = """
>> 	Here is `'"\$^some literal text with an embedded
>> 	but no trailing newline
>> 	""",
>> 	Another = ""!He said "Foo!" But that was not the end!!""
>
>
> That example falls into camp (2), user-defined terminating char, yes?

It's really closer to (1), or arguably to (4).
The framework says the opening and closing sequences are x y and y x
respectively, where x is 2 or more quotation marks and y is either
a newline or a printing character that is not a quotation mark.
The user only gets to chose how many quotation marks to use and which
non-quotation mark.  *ALL* strings still begin and end with a  
quotation mark.

It is important that there isn't any such animal as a user-defined
terminating CHARACTER but a user-selected terminating SEQUENCE.

> I suppose the motivation for (3) or (4) could be, perhaps, a desire  
> to make the string-enclosing syntax consistent, thus making it  
> easier to read unfamiliar code. The reader doesn't have to guess  
> (or look carefully for) what terminating-sequence the author chose.  
> (4) encourages consistency while still permitting nesting.

My suggestion also addresses this:  "funny" strings always begin and  
end with
multiple quotation marks, so anything without multiple quotation  
marks isn't a
"funny" string.
>
> Comparing (1) and (2), I believe the programmer who's writing the  
> code is best-positioned to decide what's an appropriate terminating  
> sequence.

I can't believe this.  My experience of using \verb|x| in LaTeX is that
time after time I've found myself choosing a delimiting character that
doesn't work.  Fond as I am of TeX, it has warts, and this is one of  
them.
What's even worse is that something that *did* work may cease to after
what looks like a small edit.

With my proposed literal string notation, and with several others, it  
would
be very straightforward to have an editor command literal-quote- 
region that
(a) determined the length of the longest run of quotation marks and
(b) always generated "..."| .... |"..." or, if | were frequent, or  
for any
     other reason, selected some other suitable character.
Combine this with literal-unquote-region, and one can easily
	- unquote the region
	- make an edit
	- requote it
and expect the result to work, whereas author-chosen terminators are  
less
likely to work.

> I think hiding the terminating sequence behind a name ("/xml", "/ 
> latex", "/url") is likely to cause bugs, or at least weird  
> compilation errors.

My proposal does *NOT* hide terminating sequences behind /xml or / 
latex or /url
or anything else.  Those things (mainly) name *ESCAPING* rules  
determining
what happens *after* the string has been read; they have nothing  
whatsoever to
do with deciding where the string *ends*.
>
> And.. we haven't discussed "raw" strings for regexes. Doh!

It's there.
>
> Joe's original proposal was:
>
>> ~n"...."   turn off quoting
>> ~r"...."    string is a regexp
>> ~x"..."    string is xml
>> ~x/FlunkyStuff ... FunkyStuff  (string is xml terminated by  
>> FunkyStuff)
>> ~myExpander/FunkyStuff .... FunckyStuff
>
> Richard, which parts of this seem especially troublesome, and which  
> are salvageable?

For one thing, ~n obviously cannot work; nor can anything which  
relies on
the termination sequence being a single character.  Actually, it doesn't
turn off quoting; it quotes really hard.  What it turns off is  
presumably
*escaping*.

For another, "n" is just too little a letter to bear a heavy freight of
meaning.  All of these single letter modifiers are just too Perlish, too
arcane, too obscure.  I often irritate my daughters by quoting one of
T.S.Eliot's "Casey" poems to them:  "You gotta use words when you  
talk to me."

While strings may be a *compact* notation for regular expressions,  
they are
often a grossly inconvenient one.  With hindsight, I realise that I  
have spent
more time desperately hacking away at regexp backslashes than I would  
have lost
by using some kind of S-expression-like format.  Stringy  
representations are
popular in C and AWK because they don't *have* any S-expression-like  
format,
but Erlang does.  Why stretch the syntax to breaking point just in  
order to
make it easier to do the wrong thing?

The same goes for ~x.  Last year I explained how easy it would be to mix
XML with Erlang syntax, and why this would be so much *better* than  
having
XML strings.  I don't want to have to go through all that again.   
Even without
that, an S-expression-like form (such as I use when hacking XML in  
Scheme) is
unutterably more convenient in almost every way than a string-like form.

And of course I repeat that the main point of my proposal is keeping all
this stuff *out* of the language until we have some experience with  
several
solutions and know which one(s) work(s) best.