[erlang-questions] benchmarks game harsh criticism

Fri Nov 30 00:28:07 CET 2007

--- David Hopwood <david.hopwood@REDACTED> wrote:

-snip-
> Then let me be more specific. From the FAQ at
> <http://shootout.alioth.debian.org/gp4/faq.php>:
> 
> # CPU Time means program usr+sys time (in seconds) which includes the
> # time taken to startup and shutdown the program. For language
> # implementations that use a Virtual Machine the CPU Time includes
> # the time taken to startup and shutdown the VM.
> 
> This is an elementary error, sufficiently serious that it's not
> enough just for the FAQ to mention it in passing. It systematically
> biases the results against language implementations with a
significant
> startup/shutdown time, or other fixed overheads. Combined with the
> fact that most of the benchmarks only run for a few seconds, the
> resulting bias is quite large.
> Note that just subtracting the time taken by the "startup benchmark"
> would not be sufficient to remove this bias, because of variations in
> startup/shutdown/overhead time between programs (even if the
> comparison graphs were set up to do that).

Let's try for some clarity.

Whether most of the programs only run for a few seconds, or not, is
irrelevant. 

Your claim of quite large bias rests on whether or not the programs,
for the language implementations suspected of "significant"
startup/shutdown, run for only a few multiples of startup/shutdown.

Your claim about variations in startup/shutdown between programs rests
on the size of those variations - are they significant?

In vain I've asked for specific information on this "quite large bias"
and seem to be in the usual situation of having to disprove a claim
someone else has made without any supporting data.

> My comments were not in any way specific to Erlang. (For Erlang,
> you should be looking at the effect on the pidigits benchmark, for
> example, which takes 4.36 seconds in HiPE.) 

The differences between statistics(wall_clock) inside the program and
bash elapsed time amounted to 0.16s give or take a few thousandths
(hello world averaged 0.18s).

pi-digits, the shortest running Erlang program, runs for ~27x longer
than startup/shutdown. 

Let's try a different approach, we can change the Erlang thread-ring
program to pass a tuple instead of an integer, repeat the N=10,000,000
load over and over again, and assume that amortizes startup/shutdown
and process creation.

After 3 hours 20 minutes, the difference between the average time and
the single measurement of "a few seconds" amounted to 2.2%

      ____________________________________________________________________________________
Get easy, one-click access to your favorites. 
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs