[erlang-questions] benchmarks game harsh criticism

Wed Nov 28 19:00:34 CET 2007

Isaac Gouy wrote:
> --- David Hopwood <david.hopwood@REDACTED> wrote:
> 
> -snip-
>> As for the 'shootout', most of the criticisms of it in this thread
>> have been valid; it's not a very good basis for comparison of
>> language performance.
> 
> Puzzled. 
> 
> afaict most of the comments in this thread have been about benchmarks
> in general and not about the benchmarks game in particular, so the way
> you state your opinion is uninformative.

Then let me be more specific. From the FAQ at
<http://shootout.alioth.debian.org/gp4/faq.php>:

# CPU Time means program usr+sys time (in seconds) which includes the
# time taken to startup and shutdown the program. For language
# implementations that use a Virtual Machine the CPU Time includes
# the time taken to startup and shutdown the VM.

This is an elementary error, sufficiently serious that it's not enough
just for the FAQ to mention it in passing. It systematically biases the
results against language implementations with a significant startup/shutdown
time, or other fixed overheads. Combined with the fact that most of the
benchmarks only run for a few seconds, the resulting bias is quite large.
Note that just subtracting the time taken by the "startup benchmark" would
not be sufficient to remove this bias, because of variations in
startup/shutdown/overhead time between programs (even if the comparison
graphs were set up to do that).

The other main factor that makes the shootout almost useless for
language comparison, is the widely differing amount of optimization
effort put into the code submissions.

-- 
David Hopwood