Ulf Wiger (AL/EAB) <>
Fri Mar 24 14:50:32 CET 2006

Ulf Wiger wrote:
> I noticed that the gcc code was compiled with '-O3',
> and the ocaml entry with '-noassert -unsafe -ccopt O3'.
> Not that I know what all that means, but it sure sounds
> like they are squeezing that little extra umph out of
> their programs.

So I did do some eprof profiling:

2> eprof:total_analyse().
FUNCTION                                       CALLS      TIME 
knucleotide:gen_freq/5                         349968     34 % 
knucleotide:update_counter/3                   349961     30 % 
ets:update_counter/3                           349961     25 % 
ets:insert/2                                   104033     8 % 
knucleotide:to_upper_no_nl/2                   51668      1 % 
ets:db_delete/1                                15         1 % 
io:request/2                                   1699       0 % 

... and so on.

Adjusting the benchmark slightly so that it reads
the data from file instead (basically two entry

main() ->
    Seq = dna_seq(stdin),

from_file(F) ->
    {ok, Fd} = file:open(F, [read]),
    Seq = dna_seq(Fd),

And then changing dna_seq() to

dna_seq(Fd)      -> seek_three(Fd), dna_seq(Fd, []).
dna_seq(Fd, Seq) ->
    case io:get_line(Fd,'') of
        eof  -> list_to_binary(lists:reverse(Seq));
        Line -> Uline = to_upper_no_nl(Line),
                dna_seq(Fd, [Uline|Seq])

and so on, mainly to make it easier to measure...
I also removed the io:fwrite() calls and simply 
used lists:map/2 to collect the results.

Compiling just gen_freq/5 to native gave very little
(time went down from 1.12 sec to 1.16 sec (ca 3%),
but compiling both gen_freq/5 and update_counter/3
gave significant speedup. Time now went down to 
0.66 sec. (Commenting out the calls to gen_freq/5
left about 100 msec, which is probably not worth
trying to optimise.)

Comparing the different compilation options:

normal: 1.22 sec
native: 0.64 sec
native+o3: 0.64 sec
selective: 0.66 sec (gen_freq/5 and update_counter/1)

Putting back all printouts, I can't see any major 
difference between non-native and native.
This is quite interesting, as the total time 
reported is ca 1.48 secs. There *should* be a 
noticeable difference.

Final experiment:

$> cp $OTP_ROOT/lib/stdlib-1.13.10/src/lists.erl .
$> cp $OTP_ROOT/lib/stdlib-1.13.10/src/io.erl .
$> cp $OTP_ROOT/lib/stdlib-1.13.10/src/io_lib* .
$> ls -1 *.erl
$> erlc -W +native *.erl

Rerunning again, I get 1.04 secs - a 30% speedup.

What's likely to be causing problems are the 
transitions between native and non-native code,
since many of the shootout benchmarks are I/O

Ulf W

