[erlang-questions] Style wars: junk comments

Thu Sep 13 01:59:53 CEST 2012

On 13/09/2012, at 1:47 AM, Alexandre Aguiar wrote:
> 
>>> On 12 September 2012 09:56, Richard O'Keefe <ok@REDACTED>
>>> that repeats something immediately obvious
> 
> Obviousness is a function of familiarity.

I am talking about things like

%%===================================
%% Macros
%%===================================

-define(FOO, bar).
...

where the two immediately following tokens ("-" and "define")
tell you "hey, this is a macro"!

If you don't understand enough Erlang to know what -define is,
you don't understand enough Erlang to know what a macro is either.
I did not *just* say obvious, but referred to repeating information
in adjacent tokens, so that a trivial computer program could
derive the comment from the code in constant time.

I've also seen code like

    %%
    %% fred
    %%

    fred(X, Y) ->
        jim({a, X, Y}).

where the entire text of the comment just repeats the next token.
It would be a very strange programmer who did not find *that* obvious.

And please, don't waste any time telling me comments like that are
useful for navigation.  Even in vi I can do
	/^fred(
to find fred.  Admittedly, an Emacs user might find it easier to
do  Ctrl-S % % f r e d RET .  That's why in my own otherwise
emacs-like editor it would be Ctrl-S RET fred ( ESC.

I do agree that somehow highlighting the first line of a
function definition could be useful, but honestly, that's a simple
syntax colouring task.  Here's a highlighter in AWK that I whipped
up in a few minutes.

For a -module, -define, or -record line, it highlights the word
immediately after the left parenthesis.
For any other -xxx line, it highlights xxx.
For function definition lines, it highlights the quoted or unquoted
atom that begins in column 1 if and only the last function
definition line did not begin with the same word.

awk -vhtml=0 -f hle.awk foo.erl
    Uses ANSI terminal escapes for bolding and unbolding.
awk -vhtml=1 -f hle.awk foo.erl
    Generates HTML with <b> for bolding.

#!/bin/awk
#   Script : hle.awk
#   Author : Richard A. O'Keefe
#   SCCS   : "@(#)2012/09/13 hle.awk    1.1"
#   Purpose: HighLight Erlang for finding functions &c quickly.
#   Usage  : awk -vhtml=0 hle.awk foobar.erl >foobar.txt
#          :     copies the input adding ANS terminal escapes for bolding;
#          : awk -vhtml=1 hle.awk foobarerl >foobar.htm
#          :     adds HTML markup using <b> for bolding.

function escape(s) {
    if (html) {
        gsub(/&/, "&", s)
        gsub(/</, "<", s)
    }
    return s
}

function print_with_bold() {
    print escape(substr($0, 1, RSTART-1))     bold \
          escape(substr($0, RSTART, RLENGTH)) unbold \
          escape(substr($0, RSTART+RLENGTH))
}

function print_sans_bold() {
    print escape($0)
}

BEGIN {
    if (html) {
        bold   = "<b>"
        unbold = "</b>"
        print "<html><head><title>Highlighted Erlang Source</title></head>"
        print "<body><pre>"
    } else {
        bold   = "\033[1m"
        unbold = "\033[0m"
    }
    last = ""
}

END {
    if (html) {
        print "</pre></body></html>"
    }
}

/^- *(define|module|record) *\( *[a-zA-Z][a-zA-Z0-9_]*/ {
    # Highlight the word after the left parenthesis.
    match($0, /\( */)
    n = RSTART+RLENGTH
    match(substr($0, n), /[a-zA-Z][a-zA-Z0-9_]*/)
    RSTART += n - 1
    print_with_bold()
    next
}

/^- *[a-z]+ *[(]/ {
    # Highlight the annotation name.
    match($0, /[a-z]+/)
    print_with_bold()
    next
}

/^([a-z][a-zA-Z0-9_]*|'([^']|\\.)*')/ {
    # Highlight the function name if it's not the same as the last one.
    if (match($0, /[^ (%]*/)) {
        name = substr($0, RSTART, RLENGTH)
        if (name != last) {
            last = name
            print_with_bold()
            next
        }
    }
}

{
    # Don't highlight anything.
    print_sans_bold() 
}   

# End of hle.awk

Combine that with html2ps, and you have a neat little .erl -> .ps
listing generator.  It's easy to tweak.  For example, we could make
-section(X).
turn into <h2>X</h2>
quite easily.
It is also easy to generate a cross reference table,

(1) > Style is not about aesthetics. It is about discipline.
(2) > And discipline is about standards.
(3) > Navigating a module with a previously known internal organization
    > is far more efficient.
(4) > Several languages have (or had) structural rules for their codings.

Ad (4), I have already mentioned COBOL and Pascal and what an
*imperial* pain in the arse their non-semantic ordering (imposed not
for the benefit of the programmer but for the benefit of the compiler;
I know that to be the case for Pascal and I believe it to be so for
COBOL) was.  Practically everyone who wrote a compiler relaxed that
order as pretty much the first extension they added, and Pascal's
successor languages dropped it as if it were red hot.

Ad (3), that claim is obviously untrue in general.
A known internal organisation helps navigation only if it is
*RELEVANT* to navigation, and my claim is that *syntactic*
sectioning is *NOT* relevant to navigation.  It is quite
certainly irrelevant to any kind of navigation I have ever
tried to do.  I never want to ask "where are the includes?"
because I can find them with a trivial Ctrl-R RET -include ESC
or ?^-include.  I might well want to ask "which include file
did _this_ come from", pointing, but syntactic sectioning is
no help whatever in answering that question.  I might well want
to find a particular function, but that's what tag files are
for (or automatically produced tables of contents).

Ad (2), discipline and standards are a means to an end.
Casabianca http://sniff.numachi.com/pages/tiBOYDECK.html
was disciplined and faithfully followed the standard set him,
but the result was a useless death.
The purpose of coding standards is working maintainable software.

A commenting convention that bloats the text (and I am prompted
here by actual *measured* factors of 2 and even more) thus
*creating* a navigation problem that would not otherwise have
existed is not a standard that *ought* to be followed.

A commenting convention that results in syntactic section headers
being left in when there is *nothing* for them to comment on is
a standard that *impairs* navigation by falsely suggesting that
there is a landing point when there is none.

Ad (1), no, style is not primarily about discipline, it is
primarily about COMMUNICATION.  

A whole bunch of things converged at the same time:

 - Talking to some people about how to present information
   in graphs, about simplifying, about not depending on
   the red-green distinction two of my colleagues here cannot
   perceive, &c.  I like the way Crothers said it in
   "On the graphical presentation of quantitative data":
	The graph is never an end in itself, it is not the
	"result", and it is important to assess the Usefulness
	of a presentation as well as such criteria as Clarity,
	Accuracy, Space, and SPeed when deciding which
	techniques to employ.

 - Slogging through some web standards (which follow a
   regular structure quite faithfully) and discovering
   at the end that the syntax had no semantics.

 - Reading an education report whose title was
	"<cultural group> students in <field of study>:
	 identifying the barriers"
   and realising at the end that while it had all the
   conventional structure and paraphernalia of an
   eduational report, it had not in fact identified any
   barriers.

 - Working through the ethics approval paperwork for an
   educational experiment I'm 3rd investigator on and
   seeing a rigid conventional structure followed perfectly
   but the actual forms to be given to the subjects were
   in heavily bureaucratic and somewhat garbled language.
   (Lots of passives, nobody doing anything but things
   mysteriously happening, comments removing themselves...)

 - Encountering this particular body of Erlang code with
   a rigid structure that was WORSE THAN USELESS FOR
   ACTUAL COMPREHENSION.

We use TOOLS to aid NAVIGATION;
we use STYLE to aid COMPREHENSION.

Because the purpose of style is to help the reader understand,
rigidly following any rule is likely to be a bad idea.
Not because rules are bad as such, but because our finite
mental capacity means that we can never envisage all the
situations that may arise, so that we are likely to meet
situations where following the letter of the rule will
violate the intent of the rule.  One of the wisest things
in the Ada Quality and Style Guidelines is that they
offer a rationale for every guideline, and make it clear
that the expected benefits of following style rules are
what really matter, not the rules as such.

> Some tags and comments work as markups that can ease learning

None of the examples I have complained of can credibly be said
to ease learning.

> and navigating

None of the examples I have complained of can credibly be said
to aid navigation in any realistic sense.

> by working as coding standards. Besides, disk space is not expensive today. :-)

Disc space is not the issue.  But SCREEN space is limited.
A style that bloats the code with junk comments reduces
the amount of *useful* text I can see at one time and thereby
reduces my ability to understand the code.  For the life of
me, I cannot see this as a good thing.

>  Not to mention that such standard markups will be essential for future implementation of cross module indexing systems and other indexing mehanisms.

You will have to provide more detail.
The junk comments I am talking about are ones that are
trivially automatically derivable from at most the first
two tokens of the next code line.
It is hard to see how those can be essential for any
indexing system.

Be really clear and explicit:
in what way is a comment

  %%======================================%%
  %% Macros                               %%
  %%======================================%%

(which might not in fact be followed by any macros at all)
essential to a cross module or other indexing system,
given that it is actually the presence of '-define'
that creates the condition "a macro is here".