[erlang-questions] traversal of multiple lists (or list of lists)

Wed Oct 26 01:47:40 CEST 2016

On 22/10/16 9:41 AM, Karlo Kuna wrote:
> i want to implement what i call Select-Consume function:
>
> take list of lists [L1 ... Ln]
> select some (potentially all) heads from L1 ... Ln and do something with
> them
> recurse with tails of lists where selection happened and with complete
> lists where there wasn't selection preserving original order
> end when all L1 ... Ln are exhausted or when there was no selected heads.

This is rather unclear.

Here's a function that takes a list of lists.
If all the lists are empty, it returns [].
If all of the lists are non-empty, it returns [Heads|Tails].
If some are empty and some not, it crashes.

unfold_lists([[Head|Tail] | []]) ->
     [[Head] | [Tail]];
unfold_lists([[Head|Tail] | Lists]) ->
     [Heads|Tails] = unfold_lists(Lists),
     [[Head|Heads] | [Tail|Tails]];
unfold_lists([[] | []]) ->
     [];
unfold_lists([[] | Lists]) ->
     [] = unfold_lists(Lists),
     [].

Now we can write things like

multilist_map(F, Lists) ->
     case unfold_lists(Lists)
       of [] -> []
        ; [Heads|Tails] -> [F(Heads) | multilist_map(Tails)]
     end.

(Something not entirely unlike this was in the DEC-10 Prolog library.)
For example,

 > multi:unfold_lists(["abc","def","ghi"]).
["adg","bc","ef","hi"]

 > multi:multilist_map(fun erlang:list_to_atom/1, ["abc","def","ghi"]).
[adg,beh,cfi]

 > multi:multilist_map(fun erlang:list_to_atom/1, ["abc","def","gh"]).
** exception error: no match of right hand side value []
      in function  multi:unfold_lists/1 (multi.erl, line 7)
      in call from multi:unfold_lists/1 (multi.erl, line 7)
      in call from multi:multilist_map/2 (multi.erl, line 16)
      in call from multi:multilist_map/2 (multi.erl, line 18)

Another approach, also found in the DEC-10 Prolog library, might be
clearer:

all_empty([[_|_]|_]) ->
     false;
all_empty([[]|Lists]) ->
     all_empty(Lists);
all_empty([]) ->
     true.

heads([[Head|_]|Lists]) ->
     [Head | heads(Lists)];
heads([]) ->
     [].

tails([[_|Tail]|Lists]) ->
     [Tail | tails(Lists)];
tails([]) ->
     [].

cons_lists([Head|Heads], [Tail|Tails]) ->
     [[Head|Tail] | cons_lists(Heads, Tails)];
cons_lists([], []) ->
     [].

With those definitions, we'd have

multilist_filter(F, Lists) ->
     case all_empty(Lists)
       of true -> []
        ; false ->
            Heads = heads(Lists),
            Rest = multilist_filter(F, tails(Lists)),
            case F(Heads)
              of true -> [Heads|Rest]
               ; false -> Rest
            end
     end.

There's a trade-off here:  unfold_lists/1 does a single traversal,
but allocates twice as much memory as it retains, while
all_empty/1, heads/1, and tails/1 allocate no unretained memory,
but take two traversals (not three: in the non-empty case the
all_empty/1 function will just check the first element).

>
> my worry is about performance, as far as i know in erlang there is no
> way to coordinate traversal of multiple lists!

Why not write the code then measure it and see if you have a problem?

> i looked at implementation of lists module and there i found horror of
> manual implementation of case combinatorics in merge/3
> which obviously does not scale at all.

merge/3 and its support functions have been micro-optimised to reduce
memory references and pattern matching work.  In the service of
something as commonly used as sorting, that makes sense.  It's not
*meant* to scale to more than 2 lists, or even to be something that
ordinary programmers would imitate.

> i am also aware that forwarding strategy in my case is making new list
> every time, which is BAD but i don't know of any other way
> to handle variadic function call other than passing list of arguments
> (in my case list of lists).

Presumably you're referring to code similar to the code above.
Why do you call it "BAD" with shrieking capitals?

You're right that there *isn't* any way to write a "variadic" function
in Erlang than to give it a list (or tuple), and that if the arguments
are different, that list (or tuple) will need to be different.

Erlang's garbage collector *expects* to be given a fair bit of work
with intermediate values being created instead of possibly shared
data being smashed.  It doesn't mind.  It's a good idea in any
programming language to avoid repeated work and garbage creation,
but not at the expense of getting the code working and appropriately
maintainable.

It really would help a lot if you could be clearer about what exactly
you are trying to do.