Finding unique elements of a list

Fri Dec 3 01:52:38 CET 2004

If you want to remove duplicate elements from a list but otherwise
preserve the order of the elements, so that, e.g., 

    remove_dups([3,1,4,1,5,9,2,6,5,3,5,8,9]) -> [3,1,4,5,9,2,6,8]

then there is an O(N.lgN) method which goes like this:
    (1) pair each element with its position number in the list
	O(N)
    (2) sort the list of pairs (using whole pairs as keys)
	O(N.lgN)
    (3) remove adjacent duplicates
	O(N)
    (4) switch the {Elem,Posn} pairs to {Posn,Elem} pairs
	O(U) where 1 <= U <= N, U is the number of unique elements
    (5) sort the list of {Posn,Elem} pairs
	O(U.lgU)
    (6) strip off the positions, {Posn,Elem} -> Elem

In Haskell notation, to keep this message short,

    remove_dups :: Ord a => [a] -> [a]

    remove_dups xs = [x | (_,x) <- sort (remdups (sort (zip xs [1..])))]
      where remdups []            = []
            remdups ((x,n) : xns) = (n,x) : remdup2 xns x

            remdup2 ((y,_) : xns) x | y == x = remdup2 xns x
            remdup2 xns           _          = remdups xns

This is a little over twice as slow as simply sorting the elements and
removing adjacent duplicates, so it's only something you would use if
the original order of the elements mattered.