Pitiful benchmark performance

Tue Jun 12 18:53:48 CEST 2001

> I think there are problems with the benchmark that casues the 
> huge need of memory, and it is not certain that a bettter GC 
> would make the problem go away. Still all benchmarks are 
> interesting to us so we might steal some of them...

I've had a better look at the Wordfreq benchmark:

Most of the cpu time seems to be taken up in reading the file one line at a
time rather than in the 4k chunks allowed in the test and used by all the
other solutions. This appears to be unavoidable using the port mechanism to
access stdin - the only options are to read one line at a time (up to a max
per line - {line, 512}) or read the whole stream in one go (thus breaking
the rules, but if done makes Erlang one of the fastest!).

Does anyone know of a way to read from stdin in 4k chunks? No worries if not
as I don't imagine any real erlang application uses this mechanism.

Memory usage is interesting with this one. The memory is all grabbed by the
runtime after reading three or four lines of the input file at the very
beginning of the run. On my SPARC machine the beam process leaps to use 80M
almost straight away and then never uses any more. The erlang thread grows
almost immediately to 22M (as reported by erlang:process_info(self(),
memory) ).

This suggests that the Erlang runtime is perhaps being a little over eager
in reserving memory in case it will be needed in the future. It is also not
very clear where the other 60M has gone.. Maybe in the driver itself?

I'd never be in favour of anything which slowed the whole thing down by
doing lots of tiny memory allocations but maybe the balance is not quite
right at the moment

(stop press..)
I just tried it on the latest patch release and the behaviour is somewhat
improved :) The beam process only gets up to 24M with the erlang thread
using 22M of that. The new dlmalloc allocator seems to be much more
effective at controlling memory usage - though I still don't see why the
erlang thread gets so big so quickly.

The size of the input file is 1.7M.. Any ideas anyone? Is the thread memory
behaviour by design?

Also using the new erts the benchmark runs a bit slower:

New:
time /opt/erlang/5.0.2.5/erl -noinput -s wordfreq main < Input > output

real    0m18.421s
user    0m15.140s
sys     0m3.110s

Old:
time /opt/erlang/5.0.2.1/erl -noinput -s wordfreq main < Input > output

real    0m15.378s
user    0m14.450s
sys     0m0.550s

lots more time spent in sys.. an inevitable consequence of reducing memory
consumption so drastically? Or something else?

Another slightly less scientific test using the Echo Client/Server sockets
benchmark. It runs about the same speed under Solaris and NT on
approximately the same speed machines.. apparently not an OS issue.

Cheers,

Sean

NOTICE AND DISCLAIMER:
This email (including attachments) is confidential.  If you have received
this email in error please notify the sender immediately and delete this
email from your system without copying or disseminating it or placing any
reliance upon its contents.  We cannot accept liability for any breaches of
confidence arising through use of email.  Any opinions expressed in this
email (including attachments) are those of the author and do not necessarily
reflect our opinions.  We will not accept responsibility for any commitments
made by our employees outside the scope of our business.  We do not warrant
the accuracy or completeness of such information.