optimization of list comprehensions
Thomas Lindgren
thomasl_erlang@REDACTED
Tue Mar 7 16:57:00 CET 2006
--- "Ulf Wiger (AL/EAB)" <ulf.wiger@REDACTED>
wrote:
> Richard A. O'Keefe wrote:
>
> > In fact, I spent about 20 minutes looking at foldl
> calls in
> > the OTP sources, and *most* of them had complex
> functions and
> > list arguments that were just simple variable
> names.
>
> Let me then provide a more interesting example.
> Based on (and supposedly equivalent to) real-life
> code,
> but never compiled nor tested:
>
> mk_tops(TIds1, TIds2, Dir, SId)
> when is_list(TIds1), is_list(TIds2) ->
> [#ctxtTop{frTId = TId1, toTId = TId2,
> dir = Dir, strId = SId} ||
> TId1 <- TIds1,
> TId2 <- TIds2,
> TId1 =/= TId2].
This operation is O(mn) for m TIds1 and n TIds2. As
used below, O(n) or O(n^2).
> expand(Tops, TIds) ->
> lists:foldr(
> fun(#ctxtTop{frTId = '*', toTId = '*', dir =
> Dir, strId = SId}, Acc)
> ->
> Acc ++ mk_tops(TIds, TIds, Dir, Sid);
> (#ctxtTop{frTId = T1, toTId = '*', dir = Dir,
> strId = SId}, Acc)
> ->
> Acc ++ mk_tops([T1], TIds, Dir, SId);
> (#ctxtTop{frTId = '*', toTId = T1, dir = Dir,
> strId = SId}, Acc)
> ->
> Acc ++ mk_tops(TIds, [T1], Dir, SId)
> end, [], Tops).
...
> I have no own idea on how to improve it, with or
> without
> folding lcs.
The outermost loop looks like it could be written with
foldl instead to
get rid of the appending. Something like this:
lists:foldl(
fun(..., Acc) -> mk_tops(From, To, Dir, SId, Acc) ;
... end,
[], Tops)
and
mk_tops(TIds1, TIds2, Dir, SId, Acc)
when is_list(TIds1), is_list(TIds2) ->
[#ctxtTop{frTId = TId1, toTId = TId2,
dir = Dir, strId = SId} ||
TId1 <- TIds1,
TId2 <- TIds2,
TId1 =/= TId2] ++ Acc.
(mk_tops now has a ++, but I believe the beam compiler
gets rid of this special case. If it doesn't, you can
perhaps get around it with a final flatten.)
The mk_tops function is O(n^2) in itself, for n TIds,
so that might be the dominant cost; impossible to say
without knowing more about the inputs. But since the
worst-case output is a list of length roughly n^2,
there is not a lot of slack; a better solution would
have to change the data representation.
For example, when '*' appears multiple times, many
list elements will have the same frTId and/or toTId
fields but different dir and strTId fields. Maybe this
can be exploited to share the work, e.g., if you
instead return
{collection, FromTo_pairs, Dir, StrTId}
then the FromTo_pairs can be shared by several
"instances" that differ in Dir or StrTId: so, while
traversing the list, check if (*,*) or (T,*) or
whatever has already been computed and reuse/share the
computed pairs if it has.
Of course, such a solution seems complex enough that
it might be worth waiting for trouble to turn up
before fighting it :-)
Best,
Thomas
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the erlang-questions
mailing list