[erlang-questions] Bug ?!

Mon Oct 9 01:30:56 CEST 2006

I wrote about ,.. in Prolog:
	>  (b) if you make it two tokens, then you hve other problems:
	>    (1) if you can have a space there, why not a newline? why not a comment?
	>        So X = [1, 2, 3,
	> 		% this can go on for some time
	> 		.. L]

Richard Carlsson <richardc@REDACTED> replied:

	Sure. But that is not very different from normal Prolog/Erlang syntax:
	         X = [1, 2, 3
	  	    % this can go on for some time
	  	     |L]
	(that there is no comma after the '3' is not much of a visual cue in a 
	larger chunk of code). If the |-syntax is not a problem, I don't see why 
	the ..-syntax should be one.

The issue was not "visual cue" but "arbitrary amounts of white space
and commentary in the middle of a SINGLE logical token".
To the extent that "visual cue" is relevant here, it's not the ABSENCE
of a visual cue that is at issue, but the PRESENCE of a MISLEADING one.u

Put it this way, ",.." has *no* known advantages over "|".

	> Problem 1:  is ,.. one token or two?
	>  (a) if you make it one token, then you have all sorts of problems:
	>    (1) ,.. is one token but in [.,.,.] ,. is two tokens, and in
	>        [...,...,...] ,... is two tokens again, and of course ,.2 had
	>        better be comma nought-point-two.
	>    (2) careful writers who put spaces after commas want to put a space
	>        after this one, [a, b, .. c]

	Of course.  My intention was to make '..' one token.

Nononono.  I didn't ask whether ".." was one token,
but whether ",.." was one token.  Different issue.

	>    (2) you have a token which would otherwise belong to an established
	>        class of tokens (including +, ++, .+, --, -, .-, ..., ...., ....)
	>        but which can only occur in one place and then doesn't mean what
	>        they do.

	I'm not sure I follow you here.

Try taking what I wrote literally.
Remember that the set of operator-like tokens in Prolog is open,
not closed as it is in Erlang, and I was talking about Prolog.

Remembering that we are discussing Prolog experience with ,..; consider

    [.]	    [..] [...]         [....] 		are all one-element lists
    [.,.]        [...,...]     [....,....]	are all two-element lists
    [.,.,.]      [...,...,...] [....,....,....]	are all three-element lists

but	    [..,..]				was NOT a two-element list
	    [..,..,..]				was NOT a three-element list

and you can't say that wasn't confusing.

	> Problem 2:  ,.. just plain doesn't work the way people expect.
	>     They expect to write
	> 	[..,X,Y] = L		% oops, only allowed at end
	> 	[X,Y,..] = L		% oops, something must follow ..
	> 	[X,Y..Z] = L		% oops, a comma must precede ..	
	> 	[X,Y,..,Z] = L		% oops, a comma may not follow ..
	> 	[X,Y,..Z] = L		% !!OOPS!! I was expecting Z to be the
	> 				% last element of L, but it's a tail!
	> 
	> In fact, for someone fluent in Latin-based writing systems that make
	> use of ellipses, ,.. is appallingly HARD to get right.  In particular,
	> your [x,y,..] is one of the cases that wasn't allowed, any more than
	> [x,y|_] was allowed.
	> 
	> When ,.. was removed from Prolog, no-one was sorry to see it go.

	It seems that they were trying to do too much. As you noted, the variant 
	that I suggested was the only one that they did _not_ implement.

No, that's not what I said, and they didn't try to do too much.
What I said is that people who are used to writing English *EXPECTED*
.. to work in ways that in fact it didn't.
The only variant that was implemented was this:

    <list> ::= '[' ']'
            |  '[' <term> <rest-list>
    <rest-list> ::= ']'            
                 |  ',' '..' <term> ']'
                 |  ',' <term> <rest-list>

	But let's reverse the situation:  _only_ the following forms
	should be allowed:  [..], [x,..], [x,y,..], and so on.

Now you are going to confuse the people who expect [X,..,Z] to work.

No, the more I look at it, the better my original proposal with [*]
looks.  '..' gets used very reasonably and effectively in Haskell.
If '..' ever means anything in Erlang, I should prefer it to do the
Haskell thing.

	The trouble with the |-syntax is that it only takes a single keystroke 
	to change the meaning of the constructed data, and once it has happened, 
	it is pretty hard to notice that a ',' somewhere should really be a '|'. 
	(I have seen this kind of bug survive for years in production code.)

Do I need to point out that this is not a plausible typing error?
To get from the "," key to the "|" key you have to go up diagonally two
and then across three.  It's not like the "," and ";" keys (up diagonally
one, across one) which are even visually similar.  "|" is visually very
different from ",".  It is, to state the blindingly obvious, considerably
taller, whereas ",.." is all on one line.  We simply are not talking
about TYPOs here, but about THINKOs, and changing the tokens isn't
likely to help with THINKOs.

	(Another thing entirely is of course that I have to press shift to 
	produce both the | and the _ - and none of them stay in the same place 
	when I switch keyboard layouts.)

I also have to press shift.  But heck, let's get some numbers here.
I wrote a tiny AWK script and ran it over the lib/stdlib/src/*.?rl files
in Erlang R11B.

1257014 lower = 65.71%		unshifted key-presses
 237850 upper = 12.43%		shifted key-presses
 344629 space = 18.02%		space key
  15285 tab   =  0.80%		tab key
  58130 enter =  3.04%		enter key
   3114 |_ uses			uses of "|" followed by "_"
   3832 |X uses			uses of "|" followed by a named variable

Changing "|_" to ",.." would trade 6118 shifted keypresses for
9342 unshifted keypresses.  This would indeed reduce the number of
shifted keypresses by about two and a half percent, but since only about
1 keypress in 8 is shifted, that's not a big saving overall.

Shifting could be reduced in an editor:
    On entry of a right bracket, if the preceding characters are "\-" 
    or "|-" or "\_" (or whatever suits your keyboard), change them to "|_".
    If the preceding characters match |<lower case identifier>,
    or \<lower case identifier>,
    ask if they should be converted to | <capitalised identifier>.
That's Erlang-specific, of course.

Changing "|_" to ",.." would still leave more than half of the uses of "|"
in lib/stdlib/ untouched, and if "[X|Y]" is allowed it is hard to see why
'[X|_]" wouldn't also be allowed in a pattern.  It is not clear how having
two different notations would help.

The point of the [*], {*}, <<*>> proposal was to have a simple way to
say something that _cannot_ be currently said in a pattern, at least for
[*] and {*}.