small optimization story

Mon Nov 11 20:42:30 CET 2002

Ahoy,

I just had to optimize the "constant factors" of some straight-line
Erlang code, for the first time in a _very_ long time. I'm basically
just posting to plug the 'fprof' OTP application, which made it very
easy! It's a very nice tool - no special compilation or anything, you
just call it. I used the handy Distel front end :-)

Anyway, since fprof slows execution down a lot, I first narrowed down
where the time was going with a macro like this:

-define(time(Tag, Expr),
        begin
            statistics(runtime),        % reset statistics counter
            (fun(Value) ->
                     {_, Delta} = statistics(runtime),
                     io:format("time(~p): ~p ms~n", [Tag, Delta]),
                     Value
             end) (Expr)
        end).

and then e.g. ?time(doing_x, x(Foo)), which calls x(Foo) and returns
its value, but prints out how long it took.

(Admittedly I just simplified that in-line and haven't tested it -
naughty.. also timer:tc seems to give much more precise results?)

The bottleneck ended up being a parser for "ls -l"-esque FTP server
output. So I wrote a small test case to parse a hardcoded listing, and
ran fprof on that. The fprof results highlighted the bottlenecks
_very_ obviously (one at a time in separate runs.)

The first was a call to regexp:match, which I rewrote as a custom
erlang function and got a big speedup. I guess this is pretty obvious
and would have been found by hand.

Much less obviously, I got a ~15% overall speed gain by rewriting a
call to string:tokens(S, "\r\n") into:

  tokenize_lines("\r\n" ++ S, Acc) -> tokenize_lines_continue(S, Acc);
  tokenize_lines("\n"   ++ S, Acc) -> tokenize_lines_continue(S, Acc);
  tokenize_lines([H|T],       Acc) -> tokenize_lines(T, [H|Acc]);
  tokenize_lines([],          [])  -> [];
  tokenize_lines([],          Acc) -> [reverse(Acc)].

  tokenize_lines_continue(S, Acc) ->
      [reverse(Acc) | tokenize_lines(S, [])].

Which is itself ~60% faster than the string:tokens call (I'm not sure
why.)

Now, if I had found this at all amongst the 150 lines of string
munging code, it would have probably taken most of the day, and then
wouldn't have been worth it for 15%. But since it was easy to see and
took about 20 minutes in total it was a wonderful speedup. Probably
best of all, now I can see that the code is pretty tight and there's
no point bothering with any more small tweaks.

So, hurray for fprof!

Though I have no idea how my 150 microseconds/line on an 800mhz
machine stacks up against other languages like C :-)

Cheers,
Luke