[erlang-questions] benchmarks game harsh criticism (was Learning Erlang from the scratch)
Fri Nov 23 16:16:32 CET 2007
well actually i have been trying to say that it would be nice to use the
really large infrastructure of the shootout to see where different
languages, relative to each other, leave the straight and narrow. speed
is just a indication (ie when the speed stops being linear (or whatever
it used to be) and starts being chaotic) of these limits.
Those were the days...
EPO guidelines 1978: "If the contribution to the known art resides
solely in a computer program then the subject matter is not
patentable in whatever manner it may be presented in the claims."
On 11/23/07 15:57, Joe Armstrong wrote:
> I've been following various discussions about benchmarks, and must say
> that I am not impressed.
> The general arguments seems to go like this:
> language A is good at X
> language B is good at Y
> language B is bad at X
> Therefore hack language B so that it is good at X
> This argument has appeared many times with different values of A B
> X and Y - usually by a proponent of
> A who because A solves X better than B assumes that A is in some sense
> better than B.
> The problem is that this argument is just plain daft.
> If A is good at solving X then use it to solve X. If B is good at
> solving Y then use it to solve Y.
> Suppose we change B to make it good at solving X, what have we done? - make
> yet another variant of A.
> In changing B to be good at solving X it might no longer be good
> at solving Y.
> So sure we could make Erlang a lot faster - for example to make
> regexps really fast all we have to do
> is link the C code for regexps into the Erlang kernel - but it would
> no longer be safe. We could make all
> function calls go a lot faster, but we could no longer change code on-the-fly.
> Anything can be speeded up - but at the cost of some other system
> property - I think its nonsense to assume that
> a single language will be good at everything. An essential part of
> systems analysis is knowing what languages
> are good for solving specific problems. Benchmarks are useful here -
> but they hardly tell me something I don't know
> already. Languages that deliberately abstract away from memory will be
> necessarily less efficient than languages
> that expose the structure of memory - but at the cost of being more
> difficulty to program if you need to dynamically
> relocate the memory or change the code ...
> At the end of the day you have to ask does the whole system work in a
> satisfactory manner, and not get hung up
> over the details of this little part or thing being efficient or
> inefficient. Scanning log files efficiently (for example) would
> be very important If it has to be done every few seconds - but if it
> is done one a year the performance is totally irrelevant.
> Making systems involves a lot more than making the individual bits run
> fast - it involves gluing the bits
> together in a way that is understandable - staring at the individual
> compensate and not seeing their relationship to the whole
> is a waste of time. Timing individual code is also irrelevant - the
> only interesting question is "is this time within my time budget"
> Remember what the old masters said - get it right first. Then make it
> fast (if you need to) - Some of our Erlang systems have
> become market leading and have achieved amazing figures for
> reliability - despite the fact that they boot slowly,
> have appalling performance at matching regular expressions and can't
> analyse giga bytes of disk logs efficiently.
> These latter three points are irrelevant for the types of system we build.
> /Joe Armstrong
> On Nov 23, 2007 1:26 PM, Bengt Kleberg <bengt.kleberg@REDACTED> wrote:
>> this is seriously off topic for erlang-questions, so i would recommend
>> each and every one of you to stop reading now.
>> kvw(*) is not a benchmark report. it is a paper about experiments on how
>> to do benchmarks. they tested some ideas and reached a few general
>> principles. the one pertinent to this discussion says "Memory-related
>> issues and the effects of memory hierarchies are pervasive: how memory
>> is managed, from hardware caches to garbage collection, can change
>> runtimes dramatically". to see this it is necessary to vary the input to
>> such an extent as to find the dramatic runtime changes. the shootout
>> does not do this.
>> isaac gouy has previously stated that the shootout is not, and shall not
>> be, about the kind of wide spectrum of inputs that kvw recommends
>> investigating. now he is instead saying that the shootout is better than
>> what kvw recommends for this kind of investigations.
>> this time he is wrong.
>> Those were the days...
>> EPO guidelines 1978: "If the contribution to the known art resides
>> solely in a computer program then the subject matter is not
>> patentable in whatever manner it may be presented in the claims."
>> On 11/18/07 21:59, Isaac Gouy wrote:
>>> On 2007-09-3 Bengt Kleberg wrote:
>>>> my main problem with the alioth shootout is that it has thrown away
>>>> one of the main ideas/insights from the paper(*) that was the
>>>> inspiration for the original shootout. namely that it is very
>>>> important to look at how the timeing changes with the size of the
>>>> input. the alioth shootout takes only 3 very similar size values.
>>>> to make things worse these 3 values must give results for the major
>>>> languages (no timeout, etc).
>>>> (*)Timing Trials, or, the Trials of Timing,
>>>>> On 2007-08-31 20:54, Michael Campbell wrote:
>>>>> Be careful with that. Alioth's shootouts are for how quickly a
>>>>> language can run a particular *algorithm*, which can at times be
>>>>> VERY DIFFERENT from how you would normally do it in that language.
>>>>> So some of the code on that will be weirdly contorted to fit the
>>>>> particular algorithm, rather than what the prevailing idiom is for
>>>>> that language.
>>>>> A somewhat more harsh criticism can be found here:
>>> My apologies for digging up this 2 month old comment, but sometimes I'm
>>> just taken-aback by criticism of the benchmarks game. I'm well aware of
>>> my limitations and value informed criticism - to a great extent we rely
>>> on others to notice our mistakes and suggest improvements and
>>> Sometimes the criticism misleads - I never know if that's the
>>> According to Bengt Kleberg "the alioth shootout takes only 3 very
>>> similar size values" which "has thrown away one of the main
>>> ideas/insights" of 'Timing Trials, or, the Trials of Timing'.
>>> Can you guess how many different input sizes were used for "Timing
>>> Trials, or, the Trials of Timing"? Do you guess 20? Do you guess 10?
>>> No. The comparisons in "Timing Trials, or, the Trials of Timing" were
>>> based on just 4 input values! The Benchmarks Game has slipped from the
>>> insightful 4 to the miserable 3 :-)
>>> As for "very similiar size values" the timing range for different input
>>> values is ~10x to ~100x, in comparison to mostly < 10x in "Timing
>>> Trials, or, the Trials of Timing".
>>> Michael Campbell points to Yariv's Blog, and I guess to Austin
>>> Ziegler's comment. The specific problem he raises - "... must be set
>>> at the user's shell [ulimit]. They do not do this and report that the
>>> Ruby program doesn't run" - was raised a year earlier on the Ruby
>>> That problem was fixed by November 2005, 9 months before Austin Ziegler
>>> ranted on Yariv's Blog - his repeated "harsh criticism" had been untrue
>>> for 9 months and by then the ackermann benchmark he complains about had
>>> been replaced.
>>> Rather than a general warning about wierdly contorted code, wouldn't it
>>> be more helpful to say which of the Erlang programs you think are
>>> wierdly contorted?
>>> Never miss a thing. Make Yahoo your home page.
>>> erlang-questions mailing list
>> erlang-questions mailing list
More information about the erlang-questions