No subject

Thu Nov 19 00:04:42 CET 2009

On Nov 18, 2009, at 11:10 PM, Michael Turner wrote:
> -- so in a way, it almost comes down to the cost of copying.   
> Quicksort
> is great, if you can do it in-place.  But if you have to do O(n log n)
> copies of the data, well ....

It's actually O(log n).

I came at this from another perspective, namely parallel programming.
The old books go on at length about communication costs as well as
computation costs.
>
>
> The discussion here reminds me that issues like these are still  
> important
> in optimization.  Which means you should always be revisiting the
> question: how important is optimization?

The answer is "you never know until you have measured."

> The last time I had any serious issue involving quicksort, it was  
> because
> a junior software engineer, not realizing there was a UNIX library
> function qsort(3),

I teach (part of) a third year software engineering paper.
(http://www.cs.otago.ac.nz/cosc345/
  comments and suggestions always welcome)
and "if you didn't write it you didn't wrong it" is a constant theme.

> I really wish I was making this story up, but it did happen.   
> Premature
> optimization truly is the root of all evil.  Don't get me wrong, I  
> love
> that Erlang two-liner version of quicksort.  But it would be a service
> to the world if, in future editions of books and documents mentioning
> it, there was also a footnote pointing out that people should use
> quicksort as an optimization, only if necessary, and that Erlang will
> tend to execute this particular quicksort rather suboptimally.  The  
> cool
> thing about the two-liner is that it's a surprisingly clear expression
> of sorting, not that it's a good implementation in any other respect.

I'm glad you raised this, because there are several more issues.
(1) The only known advantage of quicksort is that it can sort
     ARRAYS of NUMBERS without extra storage and quite quickly.
     It was known when it was invented that merge sort does fewer
     comparisons, so if the things you're sorting can't be compared
     in say under five machine instructions, you're probably better
     off with merge sort.  If you are sorting LISTS, then you have
     already paid all the storage penalty needed by merge sort, and
     the space saving of quicksort has completely disappeared.

(2) Quicksort has two known disadvantages.  It isn't stable, which
     often doesn't matter, and it takes quadratic time in the worst
     case.  Textbooks often say that the worst case is so unlikely
     you can forget about it, but it often crops up in practice.
     (There's a _reason_ the code I posted used a "fat pivot".)
     The Bentley and McIlroy "engineered" quicksort is now found in
     some C libraries, and it runs into this less often, but it can
     still happen.

(3) The textbook algorithm for list quicksort is pretty much
     guaranteed to go quadratic a LOT.  If the input is already
     in ascending or descending order, picking the first element
     is about the worst thing you can do.

(4) The moral of your story as I see it is that it ALWAYS pays to look
     in the library and at least try things.
     Sorting 100,000 random floats on the slow machine,
     Deshpande's copy of the classic quicksort: 2.36 seconds
     Best sequential qsort posted yesterday:    1.86 seconds
     lists:sort/1                               0.90 seconds

Let's face it, "quicksort" is popular more because of its name
than because of its merits.  lists:sort/1 is a merge sort.
>