optimization of list comprehensions

Tue Mar 7 01:06:00 CET 2006

Serge Aleynikov <serge@REDACTED> wrote:
	I agree with your statement in the previous email that it's
	important not to clutter the language with rediculous (sic.)
	notations, yet if list comprehensions are already a part of the
	language with implementation of map and filter patterns, why not
	allowing for folding as well, which is probably as frequently
	used operation as map?

I have already explained that in some detail.

Because the combination of map and filter requires only *SINGLE*
bindings for patterns, whereas folding requires *DOUBLE* bindings
for patterns, and there are no other constructs anywhere in Erlang
that do double binding.

	Considering more complicated folding examples, I tend to believe
	that the more complicated/lengthy the function implementation
	under the fold is, the less it matters whether lists:fold vs.
	comprehension is used, as most of the mental reading effort is
	focussed on the folding function itself.

Exactly so.  The simple ones (sum, min, max) already have names of their own.

Take this example from the OTP sources, which I have reindented:

	case (catch lists:foldl(fun(X, Acc) ->   
	    case file:raw_read_file_info(filename:join([X, ModuleFile]))
	      of {ok,_} ->
		 Y = filename:split(X),
                 throw(filename:join(lists:sublist(Y, length(Y)-1)++["priv"]))
	       ; _ ->
		 Acc
	    end
          end, false, P)),
         of false ->
	    exit(uds_dist_priv_lib_indeterminate)
          ; _ ->
	    Pd
	end

Not a very inspiring example, but the first one I came across.
It's a rather trivial fold:  it either threads 'false' all the way
through to the end or raises an exception; if 'false' gets through
then it exits otherwise it returns the new name.  This particular
case would be much clearer using a separate function:

	... find_full_name(P, ModuleFile) ...

    find_full_name([], _) ->
	exit(uds_dist_priv_lib_indeterminate);
    find_full_name([X|Xs], ModuleFile) ->
	case file:raw_read_file_info(filename:join([X,ModuleFile]))
	  of {ok,_} ->
	      Y = filename:split(X),
	      filename:join(lists:sublist(Y, length(Y)-1) ++ ["priv"])
           ; _ ->
              find_full_name(Xs, ModuleFile)
        end.

Here's another one from OTP:

	NewS = foldl(fun(App, S1) ->
	    case get_loaded(App)
              of {true, _} ->
		 S1
               ; false -> 
		 case do_load_application(App, S1)
		   of {ok, S2} ->
		      S2
                    ; Error ->
		      throw(Error)
                 end
            end
	  end, S, IncApps),

In fact, I spent about 20 minutes looking at foldl calls in the OTP
sources, and *most* of them had complex functions and list arguments
that were just simple variable names.  I saw a literal list and a
few calls to reverse() or delete() or sort() with simple variable
arguments and record field extraction, but mostly simple variables.
Here's the only complex example I found in those 20 minutes:

find_modules(P, [Path|Paths], Ext, S0) ->
    case file:list_dir(filename:join(Path, P))
      of {ok, Fs} ->
         S1 = lists:foldl(
             fun(F, S) -> sets:add_element(filename:rootname(F, Ext), S) end,
             S0,
             [F || F <- Fs, filename:extension(F) =:= Ext]),
         find_modules(P, Paths, Ext, S1)
       ; _ ->
         find_modules(P, Paths, Ext, S0)
    end;
find_modules(_P, [], _Ext, S) ->
    sets:to_list(S).

The best notation I've been able to come up with for folding
would shrink the call to lists:foldl/3 by 2 lines:

    let S1 = S0 then sets:add_element(filename:rootname(F, Ext), S1)
    for F <- Fs, filename:extension(F) =:= Ext
    in  find_modules(P, Paths, Ext, S1)

But then the whole function would shrink from 13 lines to 11 lines,
which isn't _that_ much of a shrinkage.  And remember, this is the
most complicated (no, the _only_ complicated) example I found in a
sample of 50 kSLOC of Erlang source code.  A notation which saves
you 2 SLOC out of 50 kSLOC is not _hugely_ beneficial...

So my earlier estimate of the number of places where a list folding
notation could be used in the OTP sources must not be mistaken for
an estimate of the number of places where such a notation would in
fact improve the readability of the code, because the part of the
expression that would be simplified is already as simple as it can be.

In short, while Serge Aleynikov <serge@REDACTED> is justified in
saying that "folding ... is probably as frequently used [an] operation
as map", he is _not_ justified in concluding that a special purpose
notation for folding would be as advantageous as a special purpose
notation for filter+map.

Now, that's just one sample.  It's up to the proponents of a new notation
to provide other samples showing that the notation _would_ pay off.  And
in fairness, I must admit that if such a notation did exist, code might
be written differently to take more advantage of it.  But that's just
speculation.