[erlang-questions] How about a new warning? Was: Re: trouble with erlang or erlang is a ghetto

Fri Aug 5 04:55:38 CEST 2011

On 4/08/2011, at 7:50 PM, Motiejus Jakštys wrote:

> On Fri, Jul 29, 2011 at 11:45:21AM +1200, Richard O'Keefe wrote:
>> One of the things criticised in the blog entry that we've been responding to was
>> that
>> 	{ok,Foo} = bar(...),
>> 	{ok,Foo} = ugh(...)
>> is too easy to write (when you really meant, say, Foo0, Foo1).

Please remember, people, that this wasn't *my* complaint,
but a complaint in the original "Erlang is a Ghetto" blog.
I was wondering whether we could do anything useful to
help people who _do_ tend to make that kind of mistake.
> 
> I usually remember all variable names in a function I am working on,
> because functions are small (and easy to skim through in 0.3 sec).

I don't make that kind of mistake much myself.
However, I wouldn't take that "functions are small" for granted.
> 
> Don't write functions with more than 15 to 20 lines of code.

I took the Erlang/OTP R12B-5 *.erl files and ran them through a
filter that removes comments.  Blank lines were not counted.
A line beginning with ['a-z] was taken as beginning a clause,
and a line ending with [.] was taken as ending a function.

> summary(clauses.per.function)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   2.297   2.000 495.000 
> summary(lines.per.clause)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   2.000   3.998   4.000 797.000 
> summary(lines.per.function)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   1.000    2.000    5.000    9.184   10.000 2414.000 

So yes, the mean is somewhere around 10.  However,
8.8% of functions in R12B-5 were over 20 lines, and
2.7% were over 40 lines.
0.6% were over 100 lines. The "whopper" at 2,414 lines
is admittedly extreme; the next smallest function is
"only" 1,072 lines.

> Split large
> function into several smaller ones. Don't solve the problem by writing
> long lines, remember?
> http://www.erlang.se/doc/programming_rules.shtml#REF32141

Oh yes, line length.

> summary(columns.per.nonempty.line)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   24.00   39.00   40.07   54.00 1515.00 

If you look at the source code of Reia, which I have,
you may come away with the same impression I did, which is
that Tony Arcieri writes pretty clean code that is no
hardship to read.  He tends to spread things out vertically
more than I do, but I have to say that it's tidy and
readable.  The complaint about unintentionally writing a
variable name more than once did NOT come from someone
who doesn't know how to write clean code.

You will also notice pretty quickly that some functions
just *can't* be kept below 21 lines: if you have more
than 20 cases to deal with, you are going to need more
than 20 lines to do it.  And a compiler runs into that
kind of problem quite often.  Here are the summaries
for Reia:

> summary(clauses.per.function)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   2.542   2.000 105.000 
> summary(lines.per.clause)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   2.000   3.666   3.000  35.000 
> summary(lines.per.function)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   5.000   9.316   8.000 229.000 

Since the scope of variables is a clause, not a
function, the size we should be concerned with is
clause size, and we see that each and every one
of the clauses in Reia fits well within an edit
window on a modern machine.  (The big function
is one which has, for good reason, lots of clauses;
the individual clauses are not particularly large.)

You may recall a recent thread in which I was not
particularly kind to an Erlang "enhancement" suggestion
of Arcieri's.  I think that was a bad idea, but nobody
looking at his code could accuse him of ignorance of
Erlang or incompetence as a programmer.  If he says that
he runs into unintentionally-repeated-variable-names
often enough to be a nuisance, then this is a problem 
that a *GOOD* programmer can have in Erlang, even if you
or I do not.

I gave my second lecture about static checking this
morning, and spent some time explaining the importance
of turning off inappropriate warnings so that you can
see the real problems.  I gave this example from my
Smalltalk-to-C compiler's run-time library:

#define ik_bitShift(x, y) \
    (AREINTS(x, y) && -y < ASINT(INT_BITS) \
    ? (Word)((Int)(x) >> (int)(((-y) >> 2)) &~ (ASINT(1)-1)) \
    : k_bitShift(x, y))

and the code that gets generated

          t4 = ik_bitShift(l1 /*a*/, ASINT(24));

producing warnings like

"foobar.c", line 229490: warning: shift count negative or too big: >> 1073741800

from Sun cc, gcc-4.5, and clang.  These compilers are smart enough to
see that the shift count is out of range, but so dumb that they pay
no attention whatsoever to the preceding test which ensures that the
shift in question can never under any circumstances be executed.  I get
hundreds of lines of worse-than-useless warnings, which I cannot switch
off, because these compiler writers were so arrogantly cock-sure that
THEY would never be mistaken about what was probably wrong.

For efficiency the test should probably be special-cased by the Smalltalk
compiler rather than the C compiler, but I want to make that decision when
it rises to the top of the TODO list naturally, not because half-smart
compilers make my life a misery if I don't.

I really don't want to get in the way of programmers who know what they are
doing and don't have a particular problem, but that's why I suggested being
able to switch off warnings selectively.