[erlang-questions] benchmarks game harsh criticism

Thu Nov 29 22:35:57 CET 2007

Isaac Gouy wrote:
> --- David Hopwood <david.hopwood@REDACTED> wrote:
> 
> -snip- 
>> Proportion of language implementations for which each benchmark
>> included in the default weighting takes less than 10, 30, and 60
>> seconds CPU time...
> 
> I don't wish to intrude but you haven't said whether the data you
> present is from the Debian AMD measurements or the Gentoo Intel
> measurements, the Extra languages ...

The measurements that are shown by default, which are for Gentoo Intel P4
without "Extra" languages. The ones for Debian AMD appear at first glance
to be quite similar, except in a few cases where they would have more
strongly supported my argument (e.g. the timings for mandelbrot start at
0.14 seconds, and for pidigits at 0.10 seconds).

BTW, anyone who has spent much time on benchmarking knows that you
cannot get reliable results from runs this short. I'm astonished that
I am having to argue this point.

>> Another basic mistake is that there is no indication of the variation
>> in timing between benchmark runs. At least, not without digging a bit
>> deeper: the excluded "Haskell GHC #4" result for N=9 on nsieve is
>> 1.12 s, but in the full results for nsieve-bits, the result for N=9
>> on exactly the same program run by the same language implementation is
>> 0.80 s.
> 
> Are you sure the basic mistake is not yours?

The basic mistake, as I said, is that there is no indication of the
variation in timing between benchmark runs. I stand corrected about the
specific example; I was looking at the wrong data for nsieve-bits.

-- 
David Hopwood