[erlang-questions] benchmarks game harsh criticism

Sat Dec 1 02:49:59 CET 2007

Isaac Gouy wrote:
> --- David Hopwood <david.hopwood@REDACTED> wrote:
> 
> -snip-
>>>> This is an elementary error, sufficiently serious that it's not
>>>> enough just for the FAQ to mention it in passing. It
>>>> systematically biases the results against language implementations
>>>> with a significant startup/shutdown time, or other fixed overheads.
>>>> Combined with the fact that most of the benchmarks only run for a few
>>>> seconds, the resulting bias is quite large.
>>>
>>> Specifically how large is the resulting bias?
>>
>> Probably about 10% in some cases (for JVM-based implementations and
>> Smalltalk).
> 
> Sorry, I haven't figured out a way to make sense of that - 10% of what?

Of some of the benchmark times. What else?

> I'm also a little puzzled that you say "probably about 10% in some
> cases", you claimed there was a serious elementary error and the
> resulting bias is quite large - is that just speculation?

Suppose for the sake of argument that we take the 'startup' benchmark
as a rough estimate of startup/shutdown time. (I don't claim that it
is a good estimate, but it will do for this argument.)

According to the AMD Sempron results, Erlang HiPE takes 0.1992 s for
the startup benchmark on that platform (false precision, but never
mind that). It takes 0.77 s on the pidigits benchmark on the same
platform. So for this benchmark run, around 26% of the time is taken
in startup/shutdown.

If this time were not included, the Erlang HiPE entry would move
from 17th to around 12th place (if we assume that the startup/shutdown
times for the entries between those places are not significant,
which is likely to be true in this particular case).

-- 
David Hopwood