[erlang-questions] Reassigning variables

Wed Mar 18 03:02:32 CET 2009

On 18 Mar 2009, at 7:02 am, Matthew Dempsky wrote:

> I like pattern matching in the majority of cases, but I find I write
> enough code where I need to incrementally update a data structure, and
> maintaining that code is a pain when I have code like:
>
>  X = foo(),
>  X1 = bar(X),
>  X2 = xyzzy(X1),
>  blah(X2).
>
> and later want to change it to:
>
>  X = foo(),
>  X1 = whee(X),
>  X2 = bar(X1),
>  X3 = xyzzy(X2),
>  blah(X3).

I immediately wondered just how common reassigning to the
same variable was in an imperative language.  I happen to
have 33,826 SLOC of Smalltalk around that I wrote.

	33826 SLOC
	  359 classes			94.2 SLOC/class
	 7182 method definitions	 4.6 SLOC/method
          5312 assignments		 0.74 assignments/method

I must say that these results come as something of a shock to me.
I had been feeling guilty about having some fairly long methods.
The 4.6 SLOC/method figure includes 1 SLOC for the header, so the
average method *body* is 3.6 SLOC.
I've definitely been using assignment statements freely.
I expected there to be a lot.

Paradoxically, it seems to be the very imperative nature of
Smalltalk that reduces the number of (visible!) assignments.
Take this definition:

     collect: collectBlock thenReject: rejectBlock
       |r|
       r := OrderedCollection new.
       self do: [:each | |c|
         c := collectBlock value: each.
         (rejectBlock value: c) ifFalse: [r addLast: c]].
       ^self pvtAsSpecies: r

With immutable data structures, "r addLast: c" would have
had to be "r := r addLast: c".  This method is, of course,
basically a filter + a fold.

Let's look at a method that does have a fair few assigments:

     detectMax: aBlock ifNone: noneBlock
       |unbegun maxElement maxValue|
       unbegun := true.
       self valuesDo: [:element| |value|
         value := aBlock value: element.
         (unbegun or: [maxValue < value]) ifTrue: [
           maxElement := element.
           maxValue   := value.
           unbegun    := false]].
       ^unbegun ifTrue: [noneBlock value] ifFalse: [maxElement]

This is another fold.  And it's a common pattern: what would be
a fold in a functional language turns into a pair of assignments
for each facet, one initialisation and one update.

One method stood out.  It had a record 48 assignments.
Turned out it was a lexical analyser for JSON, written in a
very C-like style (or at any rate, in the style I use when
writing lexical analysers in C).  One quick edit removed 14
assignments.  (In fact, 15, because another assignment was
there to work around a bug that simply didn't exist when a
certain variable was no longer reassigned.)  That's still
the record-holder with 33 assignments.

In a functional language like Erlang, the JSON tokeniser
would be several functions.  (At a rough guess, 12.)  So
the number of reassignments would be manageable.

I analysed Smalltalk source code because I *can*; with a
fairly simple lexical structure and no preprocessor, it's
quite easy for a little AWK script to do some useful counting.

Next I went fishing in some Java.  Thankfully, it wasn't Java
that I had written.  The collection was 91502 SLOC in 281539
raw lines -- my SLOC counter doesn't count "{" or "}" as SLOC.

	91502 SLOC
	 1248 classes			73.3 SLOC/class
	 9515 methods			 9.6 SLOC/method
	25077 assignments		 2.64 assignments/method

This is a much higher density of assignments than Smalltalk,
but it's _still_ only 2.64 assignments per method, and this
is *Java* we're talking about.  Oh, and thanks to the crudeness
of my tools, this counts initialisation as assignment (the same
in Smalltalk, too).

In this case the record-holder has an amazing 806 assignments,
but three quarters of the methods have no more than 2.  Fully
95% of the Java methods have no more than 10 assignments.

The crudity of my tools doesn't let me distinguish between the
number of assignments and the number of variables assigned to,
so I don't have a measurement for how many assignments there are
to the *same* variable.  My feeling is that except for isolated
monsters like my 48-assignment method and an XSLT processor's
806-assignment method, it's pretty low, maybe 2.

So why should there be a much worse problem for Erlang?

The answer is hinted at above.  Erlang does not have mutable
data structures (if you don't count the process dictionary or
ETS tables or the process name registry or ...) so instead of
	x foo.
	x bar.
	x ugh.
we have to do
	X1 = foo(X0),
	X2 = bar(X1),
	X3 = ugh(X2),
	...

Unless, of course, we do
	ugh(bar(foo(X0))).

> This means having to change four lines of code, when really it's
> conceptually just one change.  I'd like to suggest being able to do
> something like:
>
>  X = foo(),
>  r(X) = whee(X),
>  r(X) = bar(X),
>  r(X) = xyzzy(X),
>  blah(X).
>
Yeek.  Why pervert function call syntax?
Why not just have assignments?
	X = foo(),
	X := whee(X),
	X := bar(X),
	X := xyzzy(X),
	blah(X)

Something to a very similar effect is possible in most
functional languages, using 'let':

	let x = foo () in
	let x = whee x in
	let x = bar x in
	let x = xyzzy x in
	blah x

The only real awkwardness in adding this to Erlang is the
fact that Erlang allows already-bound variables in patterns
as well as bound ones.

As an aside: parse transforms would be more useful if you were
able to add new operators (like :=).

What I would really like are some numbers and some examples.
How *often* is this a problem?
How *many* times is "the same variable" updated, typically?
How often are there multiple-updates that don't fall into a
pattern?
Is it possible that a different coding style would eliminate
or greatly reduce the problem?

Please, before writing any parse transforms, show us two or
three real, complete, functions taken from code you really
want to use, where this problem occurs.