optimization of list comprehensions

Tue Mar 7 16:57:00 CET 2006

--- "Ulf Wiger (AL/EAB)" <ulf.wiger@REDACTED>
wrote:

>  Richard A. O'Keefe wrote:
> 
> > In fact, I spent about 20 minutes looking at foldl
> calls in 
> > the OTP sources, and *most* of them had complex
> functions and 
> > list arguments that were just simple variable
> names.
> 
> Let me then provide a more interesting example.
> Based on (and supposedly equivalent to) real-life
> code, 
> but never compiled nor tested:
> 
> mk_tops(TIds1, TIds2, Dir, SId)
>  when is_list(TIds1), is_list(TIds2) ->
>   [#ctxtTop{frTId = TId1, toTId = TId2,
>             dir = Dir, strId = SId} ||
>     TId1 <- TIds1,
>     TId2 <- TIds2,
>     TId1 =/= TId2].

This operation is O(mn) for m TIds1 and n TIds2. As
used below, O(n) or O(n^2).

> expand(Tops, TIds) ->
>   lists:foldr(
>     fun(#ctxtTop{frTId = '*', toTId = '*', dir =
> Dir, strId = SId}, Acc)
> ->
>           Acc ++ mk_tops(TIds, TIds, Dir, Sid);
>        (#ctxtTop{frTId = T1, toTId = '*', dir = Dir,
> strId = SId}, Acc)
> ->
>           Acc ++ mk_tops([T1], TIds, Dir, SId);
>        (#ctxtTop{frTId = '*', toTId = T1, dir = Dir,
> strId = SId}, Acc)
> ->
>           Acc ++ mk_tops(TIds, [T1], Dir, SId)
>     end, [], Tops).
...
> I have no own idea on how to improve it, with or
> without
> folding lcs.

The outermost loop looks like it could be written with
foldl instead to
get rid of the appending. Something like this:

lists:foldl(
  fun(..., Acc) -> mk_tops(From, To, Dir, SId, Acc) ;
... end,
  [], Tops)

and

mk_tops(TIds1, TIds2, Dir, SId, Acc)
  when is_list(TIds1), is_list(TIds2) ->
   [#ctxtTop{frTId = TId1, toTId = TId2,
            dir = Dir, strId = SId} ||
    TId1 <- TIds1,
    TId2 <- TIds2,
    TId1 =/= TId2] ++ Acc.

(mk_tops now has a ++, but I believe the beam compiler
gets rid of this special case. If it doesn't, you can
perhaps get around it with a final flatten.)

The mk_tops function is O(n^2) in itself, for n TIds,
so that might be the dominant cost; impossible to say
without knowing more about the inputs. But since the
worst-case output is a list of length roughly n^2,
there is not a lot of slack; a better solution would
have to change the data representation. 

For example, when '*' appears multiple times, many
list elements will have the same frTId and/or toTId
fields but different dir and strTId fields. Maybe this
can be exploited to share the work, e.g., if you
instead return
   {collection, FromTo_pairs, Dir, StrTId}
then the FromTo_pairs can be shared by several
"instances" that differ in Dir or StrTId: so, while
traversing the list, check if (*,*) or (T,*) or
whatever has already been computed and reuse/share the
computed pairs if it has.

Of course, such a solution seems complex enough that
it might be worth waiting for trouble to turn up
before fighting it :-)

Best,
Thomas

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com