[erlang-questions] benchmarks game harsh criticism
Fri Nov 30 00:28:07 CET 2007
--- David Hopwood <david.hopwood@REDACTED> wrote:
> Then let me be more specific. From the FAQ at
> # CPU Time means program usr+sys time (in seconds) which includes the
> # time taken to startup and shutdown the program. For language
> # implementations that use a Virtual Machine the CPU Time includes
> # the time taken to startup and shutdown the VM.
> This is an elementary error, sufficiently serious that it's not
> enough just for the FAQ to mention it in passing. It systematically
> biases the results against language implementations with a
> startup/shutdown time, or other fixed overheads. Combined with the
> fact that most of the benchmarks only run for a few seconds, the
> resulting bias is quite large.
> Note that just subtracting the time taken by the "startup benchmark"
> would not be sufficient to remove this bias, because of variations in
> startup/shutdown/overhead time between programs (even if the
> comparison graphs were set up to do that).
Let's try for some clarity.
Whether most of the programs only run for a few seconds, or not, is
Your claim of quite large bias rests on whether or not the programs,
for the language implementations suspected of "significant"
startup/shutdown, run for only a few multiples of startup/shutdown.
Your claim about variations in startup/shutdown between programs rests
on the size of those variations - are they significant?
In vain I've asked for specific information on this "quite large bias"
and seem to be in the usual situation of having to disprove a claim
someone else has made without any supporting data.
> My comments were not in any way specific to Erlang. (For Erlang,
> you should be looking at the effect on the pidigits benchmark, for
> example, which takes 4.36 seconds in HiPE.)
The differences between statistics(wall_clock) inside the program and
bash elapsed time amounted to 0.16s give or take a few thousandths
(hello world averaged 0.18s).
pi-digits, the shortest running Erlang program, runs for ~27x longer
Let's try a different approach, we can change the Erlang thread-ring
program to pass a tuple instead of an integer, repeat the N=10,000,000
load over and over again, and assume that amortizes startup/shutdown
and process creation.
After 3 hours 20 minutes, the difference between the average time and
the single measurement of "a few seconds" amounted to 2.2%
Get easy, one-click access to your favorites.
Make Yahoo! your homepage.
More information about the erlang-questions