Thu Nov 19 00:04:42 CET 2009
On Nov 18, 2009, at 11:10 PM, Michael Turner wrote:
> -- so in a way, it almost comes down to the cost of copying.
> is great, if you can do it in-place. But if you have to do O(n log n)
> copies of the data, well ....
It's actually O(log n).
I came at this from another perspective, namely parallel programming.
The old books go on at length about communication costs as well as
> The discussion here reminds me that issues like these are still
> in optimization. Which means you should always be revisiting the
> question: how important is optimization?
The answer is "you never know until you have measured."
> The last time I had any serious issue involving quicksort, it was
> a junior software engineer, not realizing there was a UNIX library
> function qsort(3),
I teach (part of) a third year software engineering paper.
comments and suggestions always welcome)
and "if you didn't write it you didn't wrong it" is a constant theme.
> I really wish I was making this story up, but it did happen.
> optimization truly is the root of all evil. Don't get me wrong, I
> that Erlang two-liner version of quicksort. But it would be a service
> to the world if, in future editions of books and documents mentioning
> it, there was also a footnote pointing out that people should use
> quicksort as an optimization, only if necessary, and that Erlang will
> tend to execute this particular quicksort rather suboptimally. The
> thing about the two-liner is that it's a surprisingly clear expression
> of sorting, not that it's a good implementation in any other respect.
I'm glad you raised this, because there are several more issues.
(1) The only known advantage of quicksort is that it can sort
ARRAYS of NUMBERS without extra storage and quite quickly.
It was known when it was invented that merge sort does fewer
comparisons, so if the things you're sorting can't be compared
in say under five machine instructions, you're probably better
off with merge sort. If you are sorting LISTS, then you have
already paid all the storage penalty needed by merge sort, and
the space saving of quicksort has completely disappeared.
(2) Quicksort has two known disadvantages. It isn't stable, which
often doesn't matter, and it takes quadratic time in the worst
case. Textbooks often say that the worst case is so unlikely
you can forget about it, but it often crops up in practice.
(There's a _reason_ the code I posted used a "fat pivot".)
The Bentley and McIlroy "engineered" quicksort is now found in
some C libraries, and it runs into this less often, but it can
(3) The textbook algorithm for list quicksort is pretty much
guaranteed to go quadratic a LOT. If the input is already
in ascending or descending order, picking the first element
is about the worst thing you can do.
(4) The moral of your story as I see it is that it ALWAYS pays to look
in the library and at least try things.
Sorting 100,000 random floats on the slow machine,
Deshpande's copy of the classic quicksort: 2.36 seconds
Best sequential qsort posted yesterday: 1.86 seconds
lists:sort/1 0.90 seconds
Let's face it, "quicksort" is popular more because of its name
than because of its merits. lists:sort/1 is a merge sort.
More information about the erlang-questions