[erlang-questions] Was there any Erlang in the heated benchmark discussion?

Wed Dec 19 08:23:42 CET 2007

Some people where unfortunate enough to read the heated discussion part 
of the harsh benchmark criticism thread (even though I wrote "do not 
read this"). Those who read it anyway might wonder if there was any 
Erlang connection.

Short answer: Yes, indirectly.

Long answer: Yes, because some shootout tests would not allow Erlangs 
good sides to be seen, thanks to an artificial cap on the input. To show 
how Erlang might benefit from the idea to increase the limit 
individually for each language let us create a test:

The benchmark test T consists of counting items. The number of items to 
count is given as an argument N. We have 0.1 second granularity in the 
timing. After 2 minutes we assume that the test is hanging and kill it.
We have two languages, M(ainstream) and O(dd). M takes 0 seconds to 
start and counts 1 item in 1 millisecond. Language O takes 1 second to 
start and counts 1 item in 1 millisecond.
M can count 1024 items before crashing. O can count 1,048,576.

If we choose to use a limited set of N we get the following:
	10	100	1000
M	0.0	0.1	1.0
O	1.0	1.1	2.0

In the shootout it is not permitted to increase the fixed limit to 
something that M can not handle.

If we stop using a fixed set of values, and instead let N increase until 
exhaustion/crash and then stop (as per my suggestion) we get:
	10	100	1000	10000	100000	1000000
M	0.0	0.1	1.0	crashed
O	1.0	1.1	2.0	11	101	killed

The shootout can still use the result from 1000 in the comparison table, 
but in the graphs we get better information about M and O.

This might sound like a silly test. However, there was a create process 
test in the shootout. Some mainstream languages could only handle less 
than ten thousand processes. Erlang could do better, but N was limited 
to give the mainstream languages a chance to do the test for all values.

This is why I want to allow N to increase until exhaustion for each 
language in the shootout, instead of capping N with the same value for 
all languages in each test. The method will also make it possible to 
avoid the current shootout problem with several languages being very 
close at about 1 second because the maximum N is set by a language that 
takes a long time for that test.

Do I think that this will stop all attempts to help languages like M to 
look better in some test? No. Consider the possibility to change T to 
count 1 item N times instead. Or change T to first increment 1 item, 
then decrement 1 item for each N. These kind of helpful designs where 
present in the shootout to help mainstream languages. When it makes 
sense to limit things, it is ok (ex: read a file in chunks).

bengt

-- 
Those were the days...
    EPO guidelines 1978: "If the contribution to the known art resides
    solely in a computer program then the subject matter is not
    patentable in whatever manner it may be presented in the claims."