[erlang-questions] Additions to lists module

Roessner, Silvester <>
Thu Nov 27 07:02:23 CET 2008


On  27. November 2008 02:32 Richard O'Keefe wrote

> This raises the question of what value should be returned when no element of the list satisfies the search criterion.

I would prefer a list. It is empty if there is no matching element or has exactly one member, the index.

I also would join all of the mentioned funtions in only one, like 

> %% @spec (List::list(), Ele) -> [integer()]
> %% @doc Returns the position of `Ele' in the `List'. [] is returned
> %%      when `Ele' is not found.
> %% @end

     find_index(List, Element, Option) -> [Index]
     find_index(List, Element) = [Index]

	Option = only_first | only_last | [every]


mit freundlichen Grüßen / with kind regards
 
Silvester Rößner

-------------------------------------------------------------------------------------------------------

This message is intended for a particular addressee only and
may contain business or company secrets. If you have received
this email in error, please contact the sender and delete the
message immediately. Any use of this email, including saving,
publishing, copying, replication or forwarding of the message
or the contents is not permitted.

-----Ursprüngliche Nachricht-----
Von:  [mailto:] Im Auftrag von Richard O'Keefe
Gesendet: Donnerstag, 27. November 2008 02:32
An: Serge Aleynikov
Cc: Erlang Users' List
Betreff: Re: [erlang-questions] Additions to lists module


On 27 Nov 2008, at 2:08 am, Serge Aleynikov wrote:

> I'd like to propose addition of these two functions to the lists 
> module.
>  Very often I find the need for this functionality and feel that it 
> belongs to the standard library.
>
> Serge
>
>
>
> %% @spec (List::list(), Ele) -> integer() %% @doc Returns the position 
> of `Ele' in the `List'. 0 is returned
> %%      when `Ele' is not found.
> %% @end
> pos(List, Ele) ->
>     pos(List, Ele, 1).

>
> pos([Ele | Tail], Ele, Pos) ->
>     Pos;
> pos([_ | Tail], Ele, Pos) ->
>     pos(Tail, Ele, Pos+1);
> pos([], _Ele, _) ->
>     0.

The main problem with this is its name;
it's abbreviated to meaninglessness.

If I may quote the Haskell List module:

     findIndex :: (a -> Bool) -> [a] -> Maybe Int
     elemIndex :: Eq a => a -> [a] -> Maybe Int

Leaving aside the fact that Haskell uses 0 origin, the Erlang analogue would be

     find_index(P, Xs) ->
	find_index_loop(P, Xs, 1).

     find_index_loop(P, [X|Xs], Index) ->
	case P(X)
	  of true  -> Index
	   ; false -> find_index_loop(P, Xs, Index + 1)
	end;
     find_index_loop(P, [], _) when is_function(P, 1) ->
	0.

     elem_index(E, Xs) ->
	elem_index_loop(E, Xs, 1).

     elem_index_loop(E, [E|_],  Index) ->
	Index;
     elem_index_loop(E, [_|Xs], Index) ->
	elem_index_loop(E, Xs, Index + 1);
     elem_index_loop(_, [], _) ->
	0.

This raises the question of what value should be returned when no element of the list satisfies the search criterion.
0 is a popular decision, going back to BASIC, but one that makes very little sense.  The version that actually makes the most sense is length(Xs)+1, so that

	elem_index(E, Xs) -> find_index(fun (X) -> X =:= E end).
	find_index(P, Xs) -> 1 + length(drop_while(P, Xs)).

because that makes reasoning about programs that use it noticeably freer of special cases.

Returning something that isn't a number (my choice here would have been 'absent') has the advantage that failing to find something would be unlikely to go un-noticed.  It's awkward with a type checker though.

The only advantages of returning 0 seem to be "consistency with old fashioned BASIC" and "simplicity for the type checker".
It makes reasoning about a program more awkward than it need be, and it makes it far too easy to fail to notice that something wasn't found.

The Haskell versions *force* you to check whether something was found or not; I suppose the nearest Erlang analogue would be to return {present,Index} for something found and 'absent' for something not found.

Summary: I recommend adopting/adapting the Haskell names, and there's a choice between
Index|absent, Index|0, Index|length+1, {present,Index}|absent
for the result.
	

>
>
> %% @spec (Pred, Acc, List::list()) -> Acc1
> %%          Pred = (Ele, Acc) -> Acc1
> %% @doc A combination of foldl/3 and takewhile/2. Accumulate the 
> result
> %%      while the predicate function returns true.
> %% @end
> foldlwhile(Pred, Acc0, [H|T]) ->
>     case Pred(H, Acc0) of
>     {true, Acc} ->
>         foldwhile(Fun, Acc, T);
>     {false, Acc} ->
>         Acc;
>     end;
> foldlwhile(_Pred, Acc, []) ->
>     Acc.

We would expect a combination of fold/3 and takewhile/2 to have the form

     foldl_while(Fun, Acc0, Pred, List) ->
	foldl(Fun, Acc0, takewhile(Pred, List))

or

     foldl_while(Fun, Acc0, Pred, [X|Xs]) ->
	case Pred(X)
	  of true  -> foldl_while(Fun, Fun(X, Acc0), Pred, Xs)
	   ; false -> Acc0
	end;
     foldl_while(Fun, Acc0, Pred, [])
	when is_function(Fun, 2), is_function(Pred, 1) ->
	Acc0.

This is the kind of code that it's pointless to write in Haskell because the compiler will fuse the loops automagically.

So there are two things about the name.
The first one for me was that the "l" after the "d" was awfully hard to see.  Separated_words_really_are_much_easier_to_read
then runtogetherrunsofwordssuchasyouseebeforeyou.  (My mail program likes them a lot more too. (:-) (:-))

The second one is that the name is a little misleading in that the parameters are not what you might expect them to be.

This is, by the way, a *perfect* example of the Boolean antipattern I mentioned before.  When you are writing the function, does 'true' mean 'keep going' or does it mean 'stop'?  Here again the name of the function is confusing, because it _doesn't_ mean to keep on folding "while the predicate function returns true" because the function NEVER returns true.

Something like

	foldl_prefix(Stepper, Acc0, [X|Xs]) ->
	    case Stepper(X, Acc0)
	      of {continue,Acc} -> foldl_prefix(Stepper, Acc, Xs)
	       ; {stop,Result}  -> Result
	    end;
	foldl_prefix(S, Acc, []) when is_function(S, 2) ->
	    Acc.

which _does_ at least fold a prefix, might be less confusing.
The function results are now completely unambiguous, there is no mention of any 'predicate' (because there is none), and while the name doesn't completely describe the function, everything it says is true.  The Stepper function here is closer to being the Fun of a fold than the Pred of a takewhile.

I'm sure someone else can think of a better name.

I am rather concerned that the final value of a logically empty list (one for which it is not the case that the list has an element and the stepper function says to continue) has its result derived in two different ways:
  - if the list is physically empty ([]), Acc0 is the result
  - if the list is logically but not physically empty ([X|_]),
    Acc1 is the result where {stop,Acc1} = Stepper(X, Acc0).
Somehow this just doesn't feel right.

If you think about it, this is *really* a combination of foldl and an unfold.  Recall that an unfold works with a function that takes a state and either says "we're at the end" or says "we are not at the end, here is an element and a new state".

I very much like
	unfolder :: State -> [] | [Element|State] but some people do not.  So let's take an unfolder to be

	unfolder :: State -> empty | {Element,State}.

Now unfold  is

	unfold(Unfolder, State) ->
	    case Unfolder(State)
               of empty -> []
	       ; {Element,State1} -> [Element|unfold(Unfolder, State1)]
	    end.

Combine foldl and unfold.

	foldl_unfold(Fun, Acc0, Unfolder, State) ->
	    case Unfolder(State)
	      of empty -> Acc0
	       ; {Element,State1} ->
		  foldl_unfold(Fun, Fun(Element,Acc0), Unfolder, State1)
	    end.

The interesting thing here is that nothing says that the State can't be a list in the first place.  You can have

	while_unfolder(P) ->
	    fun ([X|Xs]) ->
		case P(X)
		  of true -> {X,Xs}
		   ; false -> empty
		end
	      ; ([]) ->
		empty
	    end.

Then
	foldl_while(Fun, Acc0, Pred, List) ->
	    foldl_unfold(Fun, Acc0, while_unfolder(Pred), List)

This suggests that combining things with unfold is strictly more powerful than combining them with takewhile, because the unfold version can do the takewhile version but not conversely.

In particular, we can also do

	while_zip_unfolder(P) ->
	    fun ({[X|Xs], [Y,Ys]}) ->
		case P(X, Y)
		  of true  -> {{X,Y}, {Xs,Ys}}
		   ; false -> empty
		end
	      ; (_) ->
		empty
	    end.

and combine takewhile with (lazy) zipping.  The only limit here is our imagination.

In Haskell, I'd have something like

	data Stepper a = Continue a | Stop a
	data Final a = Found a | Not_Found a

	foldl_prefix :: (x -> a -> Stepper a) -> a -> [x] -> Final a

	foldl_prefix _ a [] = Not_Found a
	foldl_prefix f a (x:xs) =
	    case f x a of
		Continue a' -> foldl_prefix f a' xs
		Stop a'     -> Found a'

Something that *really* combines foldl and takewhile shouldn't be able to tell the difference between physical and logical end of list.

So this just doesn't feel right yet.

_______________________________________________
erlang-questions mailing list

http://www.erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list