[erlang-questions] benchmarks game harsh criticism (was Learning Erlang from the scratch)

Sun Nov 18 21:59:24 CET 2007

On 2007-09-3 Bengt Kleberg wrote:
> my main problem with the alioth shootout is that it has thrown away 
> one of the main ideas/insights from the paper(*) that was the
> inspiration for the original shootout. namely that it is very
> important to look at how the timeing changes with the size of the
> input. the alioth shootout takes only 3 very similar size values.
> to make things worse these 3 values must give results for the major
> languages (no timeout, etc).

> (*)Timing Trials, or, the Trials of Timing, 
> http://cm.bell-labs.com/cm/cs/who/bwk/interps/pap.html)

> > On 2007-08-31 20:54, Michael Campbell wrote:
-snip-
> > Be careful with that.  Alioth's shootouts are for how quickly a
> > language can run a particular *algorithm*, which can at times be
> > VERY DIFFERENT from how you would normally do it in that language.
> > 
> > So some of the code on that will be weirdly contorted to fit the
> > particular algorithm, rather than what the prevailing idiom is for
> > that language.
> > 
> > A somewhat more harsh criticism can be found here:
> >
http://yarivsblog.com/articles/2006/07/11/erlang-yaws-vs-ruby-on-rails/#comment-70

My apologies for digging up this 2 month old comment, but sometimes I'm
just taken-aback by criticism of the benchmarks game. I'm well aware of
my limitations and value informed criticism - to a great extent we rely
on others to notice our mistakes and suggest improvements and
alternatives. 

Sometimes the criticism misleads - I never know if that's the
intention.

According to Bengt Kleberg "the alioth shootout takes only 3 very
similar size values" which "has thrown away one of the main
ideas/insights" of 'Timing Trials, or, the Trials of Timing'.

Can you guess how many different input sizes were used for "Timing
Trials, or, the Trials of Timing"? Do you guess 20? Do you guess 10?

No. The comparisons in "Timing Trials, or, the Trials of Timing" were
based on just 4 input values! The Benchmarks Game has slipped from the
insightful 4 to the miserable 3 :-)

As for "very similiar size values" the timing range for different input
values is ~10x to ~100x, in comparison to mostly < 10x in "Timing
Trials, or, the Trials of Timing".

Michael Campbell points to Yariv's Blog, and I guess to Austin
Ziegler's  comment. The specific problem he raises - "... must be set
at the user’s shell [ulimit]. They do not do this and report that the
Ruby program doesn’t run" - was raised a year earlier on the Ruby
mailing-list. 

That problem was fixed by November 2005, 9 months before Austin Ziegler
ranted on Yariv's Blog - his repeated "harsh criticism" had been untrue
for 9 months and by then the ackermann benchmark he complains about had
been replaced.

Rather than a general warning about wierdly contorted code, wouldn't it
be more helpful to say which of the Erlang programs you think are
wierdly contorted?

      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs