[erlang-questions] benchmarks game harsh criticism (was Learning Erlang from the scratch)

Fri Nov 23 15:57:37 CET 2007

I've been following various discussions about benchmarks, and must say
that I am not impressed.

The general arguments seems to go like this:

    language A is good at X
    language B is good at Y
    language B is bad at X

   Therefore hack language B so that it is good at X

    This argument has appeared many times with different values of A B
X and Y - usually by a proponent of
A who because A solves X better than B assumes that A is in some sense
better than B.

     The problem is that this argument is just plain daft.

     If A is good at solving X then use it to solve X. If B is good at
solving Y then use it to solve Y.

     Suppose we change B to make it good at solving X, what have we done? - make
yet another variant of A.

    In changing B to be good at solving X it might no longer be good
at solving Y.

   So sure we could make Erlang a lot faster - for example to make
regexps really fast all we have to do
is link the C code for regexps into the Erlang kernel - but it would
no longer be safe. We could make all
function calls go a lot faster, but we could no longer change code on-the-fly.

   Anything can be speeded up - but at the cost of some other system
property - I think its nonsense to assume that
a single language will be good at everything. An essential part of
systems analysis is knowing what languages
are good for solving specific problems. Benchmarks are useful here -
but they hardly tell me something I don't know
already. Languages that deliberately abstract away from memory will be
necessarily less efficient than languages
that expose the structure of memory - but at the cost of being more
difficulty to program if you need to dynamically
relocate the memory or change the code ...

At the end of the day you have to ask does the whole system work in a
satisfactory manner, and not get hung up
over the details of this little part or thing being efficient or
inefficient. Scanning log files efficiently (for example) would
be very important If it has to be done every few seconds - but if it
is done one a year the performance is totally irrelevant.

Making systems involves a lot more than making the individual bits run
fast - it involves gluing the bits
together in a way that is understandable - staring at the individual
compensate and not seeing their relationship to the whole
is a waste of time. Timing individual code is also irrelevant - the
only interesting question is "is this time within my time budget"

Remember what the old masters said - get it right first. Then make it
fast (if you need to) - Some of our Erlang systems have
become market leading and have achieved amazing figures for
reliability - despite the fact that they boot slowly,
have appalling performance at matching regular expressions and can't
analyse giga bytes of disk logs efficiently.
These latter three points are irrelevant for the types of system we build.

/Joe Armstrong

On Nov 23, 2007 1:26 PM, Bengt Kleberg <bengt.kleberg@REDACTED> wrote:
> greetings,
>
> this is seriously off topic for erlang-questions, so i would recommend
> each and every one of you to stop reading now.
>
>
> kvw(*) is not a benchmark report. it is a paper about experiments on how
> to do benchmarks. they tested some ideas and reached a few general
> principles. the one pertinent to this discussion says "Memory-related
> issues and the effects of memory hierarchies are pervasive: how memory
> is managed, from hardware caches to garbage collection, can change
> runtimes dramatically". to see this it is necessary to vary the input to
> such an extent as to find the dramatic runtime changes. the shootout
> does not do this.
>
> isaac gouy has previously stated that the shootout is not, and shall not
> be, about the kind of wide spectrum of inputs that kvw recommends
> investigating. now he is instead saying that the shootout is better than
> what kvw recommends for this kind of investigations.
> this time he is wrong.
>
>
> bengt
> Those were the days...
>     EPO guidelines 1978: "If the contribution to the known art resides
>     solely in a computer program then the subject matter is not
>     patentable in whatever manner it may be presented in the claims."
>
>
>
> On 11/18/07 21:59, Isaac Gouy wrote:
> > On 2007-09-3 Bengt Kleberg wrote:
> >> my main problem with the alioth shootout is that it has thrown away
> >> one of the main ideas/insights from the paper(*) that was the
> >> inspiration for the original shootout. namely that it is very
> >> important to look at how the timeing changes with the size of the
> >> input. the alioth shootout takes only 3 very similar size values.
> >> to make things worse these 3 values must give results for the major
> >> languages (no timeout, etc).
> >
> >> (*)Timing Trials, or, the Trials of Timing,
> >> http://cm.bell-labs.com/cm/cs/who/bwk/interps/pap.html)
> >
> >
> >>> On 2007-08-31 20:54, Michael Campbell wrote:
> > -snip-
> >>> Be careful with that.  Alioth's shootouts are for how quickly a
> >>> language can run a particular *algorithm*, which can at times be
> >>> VERY DIFFERENT from how you would normally do it in that language.
> >>>
> >>> So some of the code on that will be weirdly contorted to fit the
> >>> particular algorithm, rather than what the prevailing idiom is for
> >>> that language.
> >>>
> >>> A somewhat more harsh criticism can be found here:
> >>>
> > http://yarivsblog.com/articles/2006/07/11/erlang-yaws-vs-ruby-on-rails/#comment-70
> >
> >
> >
> > My apologies for digging up this 2 month old comment, but sometimes I'm
> > just taken-aback by criticism of the benchmarks game. I'm well aware of
> > my limitations and value informed criticism - to a great extent we rely
> > on others to notice our mistakes and suggest improvements and
> > alternatives.
> >
> > Sometimes the criticism misleads - I never know if that's the
> > intention.
> >
> >
> > According to Bengt Kleberg "the alioth shootout takes only 3 very
> > similar size values" which "has thrown away one of the main
> > ideas/insights" of 'Timing Trials, or, the Trials of Timing'.
> >
> > Can you guess how many different input sizes were used for "Timing
> > Trials, or, the Trials of Timing"? Do you guess 20? Do you guess 10?
> >
> > No. The comparisons in "Timing Trials, or, the Trials of Timing" were
> > based on just 4 input values! The Benchmarks Game has slipped from the
> > insightful 4 to the miserable 3 :-)
> >
> > As for "very similiar size values" the timing range for different input
> > values is ~10x to ~100x, in comparison to mostly < 10x in "Timing
> > Trials, or, the Trials of Timing".
> >
> >
> >
> > Michael Campbell points to Yariv's Blog, and I guess to Austin
> > Ziegler's  comment. The specific problem he raises - "... must be set
> > at the user's shell [ulimit]. They do not do this and report that the
> > Ruby program doesn't run" - was raised a year earlier on the Ruby
> > mailing-list.
> >
> > That problem was fixed by November 2005, 9 months before Austin Ziegler
> > ranted on Yariv's Blog - his repeated "harsh criticism" had been untrue
> > for 9 months and by then the ackermann benchmark he complains about had
> > been replaced.
> >
> >
> >
> > Rather than a general warning about wierdly contorted code, wouldn't it
> > be more helpful to say which of the Erlang programs you think are
> > wierdly contorted?
> >
> >
> >       ____________________________________________________________________________________
> > Never miss a thing.  Make Yahoo your home page.
> > http://www.yahoo.com/r/hs
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>