[erlang-questions] benchmarks game harsh criticism

Isaac Gouy igouy2@REDACTED
Thu Nov 29 07:06:22 CET 2007

--- David Hopwood <david.hopwood@REDACTED> wrote:

> >> This is an elementary error, sufficiently serious that it's not
> >> enough just for the FAQ to mention it in passing. It
> systematically
> >> biases the results against language implementations with a
> >> significant startup/shutdown time, or other fixed overheads.
> Combined
> >> with the fact that most of the benchmarks only run for a few
> seconds,
> >> the resulting bias is quite large.
> > 
> > Specifically how large is the resulting bias?
> Probably about 10% in some cases (for JVM-based implementations and
> Smalltalk).

Sorry, I haven't figured out a way to make sense of that - 10% of what?

I'm also a little puzzled that you say "probably about 10% in some
cases", you claimed there was a serious elementary error and the
resulting bias is quite large - is that just speculation?

> > Is it large enough that we should reassess the 97.6 seconds that
> the
> > HiPE program takes for fannkuch down to the 5.99 seconds taken by
> the C
> > program, or only large enough that we should reassess it to 97.0
> > seconds?
> My comments were not in any way specific to Erlang. (For Erlang,
> you should be looking at the effect on the pidigits benchmark, for
> example, which takes 4.36 seconds in HiPE.)

Whether your comments were specific to Erlang or not, I think you could
still tell us whether the quite large bias you're talking about is 40
seconds out of that 97.6 seconds or just 0.6 seconds.

As for looking at pidigits, you seem to be saying no more than a fairly
constant startup time ought to take up a relatively larger proportion
of the shortest measurement. :-)

And if we don't stick to Erlang I fear we really will be off-topic ;-)

> > Secondly, I don't think you know that there was a widely differing
> > optimization effort - it's just an assumption.
> It is based on having seen discussions of the shootout on several
> language mailing lists and newsgroups, and observing the variation in
> effort that was put into improving the submissions in each case.

Do you even know whether those discussions resulted in programs that
are now shown on the website? Do you somehow know how much effort went
into programs I wrote and discussed with no one?

I think you could fairly say that the programming effort is unknown.

