[erlang-questions] Strings as Lists
Richard A. O'Keefe
ok@REDACTED
Mon Feb 25 02:34:40 CET 2008
On 23 Feb 2008, at 9:22 am, Matt Kangas wrote:
> 1) Define a framework, have the framework know (and hide) the
> appropriate terminating char-sequence
> 2) Let the user define a terminating char sequence. (Perl/PHP/
> Ruby's answer)
> 3) Have one terminating multi-line char sequence. (Python's answer)
> 4) Have one char-sequence, but permit its length to vary to allow
> nesting, within reason. (Lua's answer)
>
> Richard, from your last post:
>
>> Literal_String = """
>> Here is `'"\$^some literal text with an embedded
>> but no trailing newline
>> """,
>> Another = ""!He said "Foo!" But that was not the end!!""
>
>
> That example falls into camp (2), user-defined terminating char, yes?
It's really closer to (1), or arguably to (4).
The framework says the opening and closing sequences are x y and y x
respectively, where x is 2 or more quotation marks and y is either
a newline or a printing character that is not a quotation mark.
The user only gets to chose how many quotation marks to use and which
non-quotation mark. *ALL* strings still begin and end with a
quotation mark.
It is important that there isn't any such animal as a user-defined
terminating CHARACTER but a user-selected terminating SEQUENCE.
> I suppose the motivation for (3) or (4) could be, perhaps, a desire
> to make the string-enclosing syntax consistent, thus making it
> easier to read unfamiliar code. The reader doesn't have to guess
> (or look carefully for) what terminating-sequence the author chose.
> (4) encourages consistency while still permitting nesting.
My suggestion also addresses this: "funny" strings always begin and
end with
multiple quotation marks, so anything without multiple quotation
marks isn't a
"funny" string.
>
> Comparing (1) and (2), I believe the programmer who's writing the
> code is best-positioned to decide what's an appropriate terminating
> sequence.
I can't believe this. My experience of using \verb|x| in LaTeX is that
time after time I've found myself choosing a delimiting character that
doesn't work. Fond as I am of TeX, it has warts, and this is one of
them.
What's even worse is that something that *did* work may cease to after
what looks like a small edit.
With my proposed literal string notation, and with several others, it
would
be very straightforward to have an editor command literal-quote-
region that
(a) determined the length of the longest run of quotation marks and
(b) always generated "..."| .... |"..." or, if | were frequent, or
for any
other reason, selected some other suitable character.
Combine this with literal-unquote-region, and one can easily
- unquote the region
- make an edit
- requote it
and expect the result to work, whereas author-chosen terminators are
less
likely to work.
> I think hiding the terminating sequence behind a name ("/xml", "/
> latex", "/url") is likely to cause bugs, or at least weird
> compilation errors.
My proposal does *NOT* hide terminating sequences behind /xml or /
latex or /url
or anything else. Those things (mainly) name *ESCAPING* rules
determining
what happens *after* the string has been read; they have nothing
whatsoever to
do with deciding where the string *ends*.
>
> And.. we haven't discussed "raw" strings for regexes. Doh!
It's there.
>
> Joe's original proposal was:
>
>> ~n"...." turn off quoting
>> ~r"...." string is a regexp
>> ~x"..." string is xml
>> ~x/FlunkyStuff ... FunkyStuff (string is xml terminated by
>> FunkyStuff)
>> ~myExpander/FunkyStuff .... FunckyStuff
>
> Richard, which parts of this seem especially troublesome, and which
> are salvageable?
For one thing, ~n obviously cannot work; nor can anything which
relies on
the termination sequence being a single character. Actually, it doesn't
turn off quoting; it quotes really hard. What it turns off is
presumably
*escaping*.
For another, "n" is just too little a letter to bear a heavy freight of
meaning. All of these single letter modifiers are just too Perlish, too
arcane, too obscure. I often irritate my daughters by quoting one of
T.S.Eliot's "Casey" poems to them: "You gotta use words when you
talk to me."
While strings may be a *compact* notation for regular expressions,
they are
often a grossly inconvenient one. With hindsight, I realise that I
have spent
more time desperately hacking away at regexp backslashes than I would
have lost
by using some kind of S-expression-like format. Stringy
representations are
popular in C and AWK because they don't *have* any S-expression-like
format,
but Erlang does. Why stretch the syntax to breaking point just in
order to
make it easier to do the wrong thing?
The same goes for ~x. Last year I explained how easy it would be to mix
XML with Erlang syntax, and why this would be so much *better* than
having
XML strings. I don't want to have to go through all that again.
Even without
that, an S-expression-like form (such as I use when hacking XML in
Scheme) is
unutterably more convenient in almost every way than a string-like form.
And of course I repeat that the main point of my proposal is keeping all
this stuff *out* of the language until we have some experience with
several
solutions and know which one(s) work(s) best.
More information about the erlang-questions
mailing list