[erlang-questions] Additions to lists module

Thu Nov 27 02:32:24 CET 2008

On 27 Nov 2008, at 2:08 am, Serge Aleynikov wrote:

> I'd like to propose addition of these two functions to the lists  
> module.
>  Very often I find the need for this functionality and feel that it
> belongs to the standard library.
>
> Serge
>
>
>
> %% @spec (List::list(), Ele) -> integer()
> %% @doc Returns the position of `Ele' in the `List'. 0 is returned
> %%      when `Ele' is not found.
> %% @end
> pos(List, Ele) ->
>     pos(List, Ele, 1).

>
> pos([Ele | Tail], Ele, Pos) ->
>     Pos;
> pos([_ | Tail], Ele, Pos) ->
>     pos(Tail, Ele, Pos+1);
> pos([], _Ele, _) ->
>     0.

The main problem with this is its name;
it's abbreviated to meaninglessness.

If I may quote the Haskell List module:

     findIndex :: (a -> Bool) -> [a] -> Maybe Int
     elemIndex :: Eq a => a -> [a] -> Maybe Int

Leaving aside the fact that Haskell uses 0 origin,
the Erlang analogue would be

     find_index(P, Xs) ->
	find_index_loop(P, Xs, 1).

     find_index_loop(P, [X|Xs], Index) ->
	case P(X)
	  of true  -> Index
	   ; false -> find_index_loop(P, Xs, Index + 1)
	end;
     find_index_loop(P, [], _) when is_function(P, 1) ->
	0.

     elem_index(E, Xs) ->
	elem_index_loop(E, Xs, 1).

     elem_index_loop(E, [E|_],  Index) ->
	Index;
     elem_index_loop(E, [_|Xs], Index) ->
	elem_index_loop(E, Xs, Index + 1);
     elem_index_loop(_, [], _) ->
	0.

This raises the question of what value should be returned
when no element of the list satisfies the search criterion.
0 is a popular decision, going back to BASIC, but one that
makes very little sense.  The version that actually makes
the most sense is length(Xs)+1, so that

	elem_index(E, Xs) -> find_index(fun (X) -> X =:= E end).
	find_index(P, Xs) -> 1 + length(drop_while(P, Xs)).

because that makes reasoning about programs that use it
noticeably freer of special cases.

Returning something that isn't a number (my choice here would
have been 'absent') has the advantage that failing to find
something would be unlikely to go un-noticed.  It's awkward
with a type checker though.

The only advantages of returning 0 seem to be "consistency
with old fashioned BASIC" and "simplicity for the type checker".
It makes reasoning about a program more awkward than it need
be, and it makes it far too easy to fail to notice that
something wasn't found.

The Haskell versions *force* you to check whether something
was found or not; I suppose the nearest Erlang analogue
would be to return {present,Index} for something found and
'absent' for something not found.

Summary: I recommend adopting/adapting the Haskell names,
and there's a choice between
Index|absent, Index|0, Index|length+1, {present,Index}|absent
for the result.

>
>
> %% @spec (Pred, Acc, List::list()) -> Acc1
> %%          Pred = (Ele, Acc) -> Acc1
> %% @doc A combination of foldl/3 and takewhile/2. Accumulate the  
> result
> %%      while the predicate function returns true.
> %% @end
> foldlwhile(Pred, Acc0, [H|T]) ->
>     case Pred(H, Acc0) of
>     {true, Acc} ->
>         foldwhile(Fun, Acc, T);
>     {false, Acc} ->
>         Acc;
>     end;
> foldlwhile(_Pred, Acc, []) ->
>     Acc.

We would expect a combination of fold/3 and takewhile/2
to have the form

     foldl_while(Fun, Acc0, Pred, List) ->
	foldl(Fun, Acc0, takewhile(Pred, List))

or

     foldl_while(Fun, Acc0, Pred, [X|Xs]) ->
	case Pred(X)
	  of true  -> foldl_while(Fun, Fun(X, Acc0), Pred, Xs)
	   ; false -> Acc0
	end;
     foldl_while(Fun, Acc0, Pred, [])
	when is_function(Fun, 2), is_function(Pred, 1) ->
	Acc0.

This is the kind of code that it's pointless to write in Haskell
because the compiler will fuse the loops automagically.

So there are two things about the name.
The first one for me was that the "l" after the "d" was awfully
hard to see.  Separated_words_really_are_much_easier_to_read
then runtogetherrunsofwordssuchasyouseebeforeyou.  (My mail
program likes them a lot more too. (:-) (:-))

The second one is that the name is a little misleading in that
the parameters are not what you might expect them to be.

This is, by the way, a *perfect* example of the Boolean
antipattern I mentioned before.  When you are writing the
function, does 'true' mean 'keep going' or does it mean
'stop'?  Here again the name of the function is confusing,
because it _doesn't_ mean to keep on folding "while the
predicate function returns true" because the function NEVER
returns true.

Something like

	foldl_prefix(Stepper, Acc0, [X|Xs]) ->
	    case Stepper(X, Acc0)
	      of {continue,Acc} -> foldl_prefix(Stepper, Acc, Xs)
	       ; {stop,Result}  -> Result
	    end;
	foldl_prefix(S, Acc, []) when is_function(S, 2) ->
	    Acc.

which _does_ at least fold a prefix, might be less confusing.
The function results are now completely unambiguous, there
is no mention of any 'predicate' (because there is none),
and while the name doesn't completely describe the function,
everything it says is true.  The Stepper function here is closer
to being the Fun of a fold than the Pred of a takewhile.

I'm sure someone else can think of a better name.

I am rather concerned that the final value of a logically
empty list (one for which it is not the case that the list
has an element and the stepper function says to continue)
has its result derived in two different ways:
  - if the list is physically empty ([]), Acc0 is the result
  - if the list is logically but not physically empty ([X|_]),
    Acc1 is the result where {stop,Acc1} = Stepper(X, Acc0).
Somehow this just doesn't feel right.

If you think about it, this is *really* a combination of
foldl and an unfold.  Recall that an unfold works with a
function that takes a state and either says "we're at the
end" or says "we are not at the end, here is an element
and a new state".

I very much like
	unfolder :: State -> [] | [Element|State]
but some people do not.  So let's take an unfolder to be

	unfolder :: State -> empty | {Element,State}.

Now unfold  is

	unfold(Unfolder, State) ->
	    case Unfolder(State)
               of empty -> []
	       ; {Element,State1} -> [Element|unfold(Unfolder, State1)]
	    end.

Combine foldl and unfold.

	foldl_unfold(Fun, Acc0, Unfolder, State) ->
	    case Unfolder(State)
	      of empty -> Acc0
	       ; {Element,State1} ->
		  foldl_unfold(Fun, Fun(Element,Acc0), Unfolder, State1)
	    end.

The interesting thing here is that nothing says that the State
can't be a list in the first place.  You can have

	while_unfolder(P) ->
	    fun ([X|Xs]) ->
		case P(X)
		  of true -> {X,Xs}
		   ; false -> empty
		end
	      ; ([]) ->
		empty
	    end.

Then
	foldl_while(Fun, Acc0, Pred, List) ->
	    foldl_unfold(Fun, Acc0, while_unfolder(Pred), List)

This suggests that combining things with unfold is strictly more
powerful than combining them with takewhile, because the unfold
version can do the takewhile version but not conversely.

In particular, we can also do

	while_zip_unfolder(P) ->
	    fun ({[X|Xs], [Y,Ys]}) ->
		case P(X, Y)
		  of true  -> {{X,Y}, {Xs,Ys}}
		   ; false -> empty
		end
	      ; (_) ->
		empty
	    end.

and combine takewhile with (lazy) zipping.  The only limit here is
our imagination.

In Haskell, I'd have something like

	data Stepper a = Continue a | Stop a
	data Final a = Found a | Not_Found a

	foldl_prefix :: (x -> a -> Stepper a) -> a -> [x] -> Final a

	foldl_prefix _ a [] = Not_Found a
	foldl_prefix f a (x:xs) =
	    case f x a of
		Continue a' -> foldl_prefix f a' xs
		Stop a'     -> Found a'

Something that *really* combines foldl and takewhile shouldn't
be able to tell the difference between physical and logical end
of list.

So this just doesn't feel right yet.