Finding unique elements of a list

Richard A. O'Keefe ok@REDACTED
Fri Dec 3 01:52:38 CET 2004


If you want to remove duplicate elements from a list but otherwise
preserve the order of the elements, so that, e.g., 

    remove_dups([3,1,4,1,5,9,2,6,5,3,5,8,9]) -> [3,1,4,5,9,2,6,8]

then there is an O(N.lgN) method which goes like this:
    (1) pair each element with its position number in the list
	O(N)
    (2) sort the list of pairs (using whole pairs as keys)
	O(N.lgN)
    (3) remove adjacent duplicates
	O(N)
    (4) switch the {Elem,Posn} pairs to {Posn,Elem} pairs
	O(U) where 1 <= U <= N, U is the number of unique elements
    (5) sort the list of {Posn,Elem} pairs
	O(U.lgU)
    (6) strip off the positions, {Posn,Elem} -> Elem

In Haskell notation, to keep this message short,

    remove_dups :: Ord a => [a] -> [a]
    
    remove_dups xs = [x | (_,x) <- sort (remdups (sort (zip xs [1..])))]
      where remdups []            = []
            remdups ((x,n) : xns) = (n,x) : remdup2 xns x

            remdup2 ((y,_) : xns) x | y == x = remdup2 xns x
            remdup2 xns           _          = remdups xns

This is a little over twice as slow as simply sorting the elements and
removing adjacent duplicates, so it's only something you would use if
the original order of the elements mattered.




More information about the erlang-questions mailing list