[erlang-questions] atoms with newlines

Thu Feb 27 03:02:20 CET 2014

On 27/02/2014, at 1:33 PM, Valentin Micic wrote:
> 
> I do see your point (and feel your frustration), however, you are using a logical fallacy to reinforce your argument as you making an appeal to authority -- others are using it, thus it has to be right.

You have completely misunderstood.

In this debate,
I have *NEVER* argued that any treatment of \n is *RIGHT* and
I have *NEVER* appealed to authority.

I am making a purely PRAGMATIC argument:

  BECAUSE
     We can already insert a newline character into the
     value of a quoted atom or quoted string using
     either a bare newline or \n or \^<newline>
  THEREFORE
     Having \<newline> do so as well does not add
     much to utility.  (It's not *WRONG*, just *USELESS*.)

  BECAUSE
    Many Erlang programmers have experience in other languages
    and those other languages use \<newline> for a seamless
    join
  AND
    The present behaviour is not documented in the
    reference manual so that those programmers will not have
    their expectations corrected by documentation
  THEREFORE
    many Erlang programmers will be SURPRISED by the actual
    behaviour of Erlang.

  BECAUSE
    The backslash newline combination is silently accepted
    by the compiler
  THEREFORE
    The unpleasant surprise will be when the code is tested,
    if then, not when the code is edited or compiled.

Contrast Erlang with Common Lisp here.
I have never argued that \<newline> should vanish in
Common Lisp, because Common Lisp has documented the
behaviour of \ very clearly.  (In section 2.1.4.6 and
2.1.4.5 and 2.4.5.)  It doesn't matter how many other
languages do what; Common Lisp is explicit about what
*IT* does.

Oh, and Common Lisp doesn't _need_ string pasting because
it has #.(concatenate "foo" "bar") .

  BECAUSE
    It is useful to be able to write a string or a quoted
    atom that extends over multiple lines without thereby
    being forced to include newline characters in the
    value denoted
  THEREFORE
    it would be useful to adopt some convention such as
    C's to allow unwanted newlines to be elided.

This is not an appeal to AUTHORITY, it is an appeal to
UTILITY.

It's worth pointing out here that (O'CAML, F#) and the
C family do slightly different things.  After
\<newline>, O'CAML and F# will ignore leading spaces
and tabs; the C family will not.  Some Prolog systems
support \c as well as \<newline>, the \c having been
ultimately inspired by the \c feature of echo(1).

I really don't see any appeal to authority here.

> Also, even if we assume that your argument may be valid for a particular context, say, strings in Python, it would not make much sense to consider it in this situation, for you know very well that strings and atoms are not the same thing.

Yes, but ***SO WHAT***?  Strings and atoms are not the same
thing in Prolog or Mercury either.  What of that?  They may
both be _notated_ similarly.

It is not the case that Erlang does something sensible with
\<newline> in strings and something stupid with \<newline>
in atoms.  It does the *SAME* thing in both contexts, and
it is equally dangerous in both.

> Using similar reasoning people may also expect that:
> 
> A = 1,
> A = A + 1
> 
> should not crash a program

This claim is invalid, because my reasoning makes an
*essential* appeal to the fact that the behaviour of
backslash+newline is *UNDOCUMENTED* in Erlang.

The behaviour here is *DOCUMENTED*.

See the difference?

[Skipping]
> 
> I think you are making an assumption this was "an implementation accident".

Yes, I am.  BECAUSE IT IS NOT DOCUMENTED!
> 
> I am making an assumption that this was a well considered and deliberate effort.

Why would a "well considered and deliberate effort"
 - provide a redundant way to do something we have *THREE*
   other ways to do
 - deliberately create a hazard
 - FAIL TO DOCUMENT the decision?

> In both cases we are talking about assumptions, why present them as facts then?

When did I do that?
> 
>> 
>> It would be excusable to make backslash+newline an
>> always-reported syntax error.  People would get a nasty
>> surprise, but at least they wouldn't silently get the
>> wrong value in their program.  In fact I think that
>> *every* \<char> combination that is not explicitly
>> documented should be reported as a syntax error.
> 
> Why is it that you're ignoring the fact that the intent behind usage of apostrophes in atom construction is documented.

Because I am *NOT* ignoring that, and you have no grounds
for suggesting otherwise.

> Also, if you place a new-line during the construction of an atom without apostrophes, the syntax error will be indeed reported.

So what?  That's totally off-topic.  In fact 
	if<newline>a =:= b ...
is *NOT* a syntax error and *WON'T* be reported.
It just isn't one atom, that's all.

> 
>> 
>> Second, it is currently the case that backslash has a
>> special meaning in Erlang *ONLY* inside quoted literals
>> (counting $\t as a quoted literal).  Your "less confusing
>> (or even more consistent)" proposal introduces a new
>> thing that *looks* like an operator but *isn't* one.
> 
> As opposed to not introducing a new operator, but definitely behaving like there was one (e.g. one that eliminates new-line character altogether)?

There are no operators inside strings or atoms.
Backslash+newline is not described as or treated as or thought as
in any way resembling an operator in languages that define it.
A backlash+newline that stands for the empty sequence is no more
an operator than a backslash+newline that stands for a non-empty
sequence.  (If you have a programming language with
byte strings, a Unicode escape \u2022 might stand for more than
one byte.  It wouldn't be an operator either.)

>> (*) In my view, it is nothing more than a historical incident (or a whim of someone in a position of authority a long time ago) that yielded an elimination of  new-line character when preceded by a backslash, whilst in any other case it is a backslash that is eliminated.

What?  In a sequence like \t, neither character is eliminated.
The *whole* escape sequence is replaced by another character
sequence.

> For example:
> 
> \"  results in a double quote -- backslash is eliminated;
> \\  results in a backslash (having first backslash eliminated);
> \002 results in a integer value of two -- backslash is eliminated;

Only if you want to say that the 0, the second 0, and the 2 are
also "eliminated".

> \n results in integer value of 0x0a -- backslash is eliminated;

And so is the n.  "Elimination" just is not a helpful way to
talk about the replacement of (entire) escape sequences by
other character sequences (which might or might not have unit
length).

"Elimination" *would* be a useful way to talk about
single-escape processing in Common Lisp.  But Erlang is not Lisp.
(LFE is another matter.)
> 
> Why should then a sequence such:
> 
> "This is a test  \  0x0a
> string"

Remember my demand that UNDOCUMENTED escape sequences
(like backslash+space or backslash+tab) should be explicit
compile-time errors.  If backslash+space or backslash+tab
were to be documented as doing something, no matter how
useless, then that is would they would do.  Until then,
this should be a compile-time error.

> Result in a string  "This is test string" with both backslash and new-line eliminated. 

It should not.  (The only newline I see is the one after 0x0a,
and nobody is saying _that_ should go.)

> And yet:
> 
> "This is a test\n string"
> 
> Would result in insertion of a new-line character:

Because that's what backslash+n is defined to do.

Escape processing in quoted literals
replaces <escape SEQUENCES> with <character SEQUENCES>
which might contain no characters, 1 character, or many
characters.  Whatever the replacement is, is whatever
the documentation says it is.

Undefined escape sequences should be errors.
In fact, in R16B03-1, most undefined escape sequences
*are* errors.  

Here's an example of the trouble that unchecked
undocumented escape sequences can get you into.
In R16B03-1, "\x{20}" uses a documented escape
sequence to get a space character " ".  Trying it in
an older release gave me "x{20}".