[erlang-questions] benchmarks game harsh criticism

Bengt Kleberg <>
Tue Dec 4 15:53:23 CET 2007


greetings,

so, you think that the 3 quotes from "Timing Trials" does not recommend 
a certain method to show additional insights, instead they describe a 
certain method to show additional insights. that is ok with me. i can 
change from recomend to describe, without losing track of the target.

moreover, i will assume that you mention "C runtimes appears absolutely 
horizontal" [for this test], as one example when "Timing Trials" write 
"anomalous behavior that deserves further attention." that is no problem 
with me. (if you mean that this is the one and only thing ever to 
deserve more investigation, i would like an explanation on how you have 
managed to arrive at that idea.)


anyway then, we seem to agree on the following: "Timing Trials" observes 
that it is a good idea to have sufficiently many measuring points during 
benchmarking to be able to spot anomalous behaviour. i think the 
shootout does not do this and that it would be a good thing if it did. 
you do not want the shootout do this, for reasons never explained.


bengt
Those were the days...
    EPO guidelines 1978: "If the contribution to the known art resides
    solely in a computer program then the subject matter is not
    patentable in whatever manner it may be presented in the claims."


On 11/30/07 21:17, Isaac Gouy wrote:
> --- Bengt Kleberg <> wrote:
> -snip-
>> quotes from "Timing Trials, or, the Trials of Timing" as per request:
>> "* Memory-related issues and the effects of memory hierarchies are 
>> pervasive: how memory is managed, from hardware caches to garbage 
>> collection, can change runtimes dramatically. Yet users have no
>> direct 
>> control over most aspects of memory management. "
>>
>> "We started to construct a table with three dimensions: task, 
>> programming language, and machine. Eventually we added the size of
>> the 
>> problem solved by the program as a fourth dimension, and we changed
>> the 
>> presentation from tables to graphs. Varying the problem size helped
>> us 
>> to detect unusual runtime effects, while a graphical presentation 
>> highlights patterns and trends in runtime instead of individual 
>> performance scores."
>>
>> "We designed tests whose runtime should grow linearly with the size
>> of 
>> the problem: runtime = mÂ?size + b. Thus, if we choose size to be
>> large 
>> enough to justify ignoring the fixed overhead (b), the log-log plot 
>> should show a straight line of unit slope. Exceptions indicate
>> anomalous 
>> behavior that deserves further attention."
> 
> 
> (Incidentally what they meant by anomalous behaviour is something like
> this: "the line connecting C runtimes appears absolutely horizontal.
> ... This happens because the optimizer eliminates the entire loop,
> replacing it by sum = n.")
> 
> 
> Let me suggest to you that the paragraphs you quote are descriptions
> not recommendations. 
> 
> Here's a recommendation: "... we advise all who want to know which
> version of a program will run faster to construct test programs and
> find out the truth for their language processor and machine."
> 
> Here's another: "It does seem wise to take all such experiments
> ­including these ­with a large grain of salt."
> 
> 
>> Now it is your turn. could you quote the exact words where they say
>> that 4 inputs and a spread of x10 is good enough?
> 
> There is no such statement, nor have I claimed that there is: I have
> described the spread shown in the tests - "we can see that they varied
> the problem size by < ~10x" 
> 
> 
> 
>       ____________________________________________________________________________________
> Be a better sports nut!  Let your teams follow you 
> with Yahoo Mobile. Try it now.  http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions



More information about the erlang-questions mailing list