Impact of native compilation

Sat Sep 7 11:29:15 CEST 2002

Per Bergqvist wrote:
[...]
> It looked like hipe did a very good job generating code for the body
> of the function, but did no optimization for
> recursive calls (i.e. loops). It handled it as any other call.

Well, almost, a tail-call to the same function is turned into something
slightly faster than any other call: it is actually a "real" loop and no
stack adjustment code is needed.

> After each execution of the "body" hipe called runtime functions to
> allow housekeeping and scheduling.

Yes, this is the real problem. The scheduling in the Erlang runtime system
is done by counting reductions -- A process gets a number of reductions and
each function call decrements this number, when the count reaches zero the
process is suspended. This mechanism is also used in HiPE, which means that
each function call still has to reduce the reduction count and test for
zero.

Without this test a bug which makes one process loop forever could block all
processes on the node, which of course is unacceptable. Still, there are
many cases where unrolling could be done, and this we hope to do in the
upcoming (well sometime in the future) new HiPE front-end (Core Erlang to
HiPE).

> Unfortunately most functions in erlang are very small and the setup
> overhead for each body call was significant resulting in  very small
> perfomance gains (but still some).

Well, compared to the BEAM the performance gain of native-code compiling a
tight loop is 4 to 10 times.

> Compare this with an tight optimized (unrolled|coiled|pipelined) loop
> in C doing condition checking on a register flag.

Yes, we still have a long way to go to get to the performance of a
statically typed language.

> This is a bit unfortunate since performance bottlenecks can usually be
> tracked down a few inner loops for almost any given system...

Yes, this is the type of code that HiPE concentrates on optimizing. We would
very much like it if you could send us examples of such loops where the HiPE
compiler does not do a good job so that we can find the problem and improve
the compiler.

Mickaël Rémond Wrote:

> > We were wondering with Thierry Mallard about the impact of native
> compilation.
> > We tried to compare a big amount of floating point operations (100
> millions).
> > The result is greatly improved over standard Erlang.

Yes, HiPE now has native support for floating point operations (Both SPARC
and x86) with very promising results, more information will come with the
announcements for R9 and HiPE 2.0.

I would like to stress that there is yet no new release of HiPE since HiPE
1.0, and that the P9 snapshots are not official releases, these snapshots
often contains untested and even broken code, use with extreme care.

To get back to the question on the performance of the example program
  loop(0) ->
    ok;
  loop(Count) ->
    1.0 * 1.234,
    loop(Count -1).

In this case the even the BEAM compiler throws away the fp operation the
resulting code is just:
  loop(0) ->
    ok;
  loop(Count) ->
    loop(Count -1).

To test fp performance you have to hide the constants from the compiler but
still hint to it that they are floats:

loop(N)->
  floop(N,1.0,1.234).

floop(0,_,_) ->
  ok;
floop(Count,A,B) when float(A), float(B) ->
  A * B,
  floop(Count -1,A,B).

7> test2:run().
{66662095,ok}
8> hipe:c(test2,[o3]).
{ok,test2}
9> test2:run().
{15735078,ok}
10> 66662095/15735078.
4.23653

A four times speedup for native code.

/Erik
--------------------------------------
I'm Happi, you should be happy.
Praeterea censeo 0xCA scribere Erlang posse.