[erlang-questions] HiPE performance gain (Was: Re: [erlang-questions] Erlang static linked interpreter)

Thu Jan 20 12:50:49 CET 2011

"Our experience on AXD 301 way back was ambiguous, though part of that was
due to
insufficient I-cache on the Ultrasparc -- cache misses ate the other
performance
gains. (This is one of the classic arguments for using an emulator instead
of
native compilation.) Perhaps modern hardware would make better use of the
native
code." (Thomas Lindgren)

How the emulator is coded probably makes a difference too.  In my few visits
to the BEAM VM code, I was struck by the amount of inlining (and via long C
macros, whoa, scary!)  I began to wonder whether those macros really paid
off, when you consider the more limited I-caches.  If their bloat also tends
to crowd out HiPE-generated native code, that's another performance ding,
albeit more indirectly suffered.  Wouldn't it make more sense to recode
those macros as static functions and try various compiler optimization flags
relevant to automatic inlining of functions?  A smaller (in generated native
code) VM that *seemed* to have more function call overhead might
nevertheless actually be a win for HiPE-compiled code.  Maybe a win more
generally, at least on some architectures.  For giving HiPE a chance, it
might be worth an experiment or two.

-michael turner

On Thu, Jan 20, 2011 at 8:01 PM, Thomas Lindgren
<thomasl_erlang@REDACTED>wrote:

>
>
>
>
> ----- Original Message ----
> > From: Ciprian Dorin Craciun <ciprian.craciun@REDACTED>
> ...
> >      More exactly what are the use-cases which are likely to benefit
> > from  HiPE? I would guess that in a CPU bound application HiPE would do
> > better than  without, but what about network I/O bound applications, or
> > applications that  deal mainly with strings (represented as lists), or
> > applications that deal  mostly with binary matching?
>
>
> As far as I'm aware, Hipe mostly can't optimize the runtime system.
> Sockets,
> drivers, etc, are just invoked. Some BIFs are inlined, but the more complex
> ones
> are just invoked. Binary matching is (mostly or entirely?) native compiled.
> List
> traversal should be fast, though the emulator cheats (or at least used to
> cheat)
> by invoking C code for some common operations. Pattern matching is
> generally
> fast, though (some?) guards may be invoked instead of inlined.
>
> Our experience on AXD 301 way back was ambiguous, though part of that was
> due to
> insufficient I-cache on the Ultrasparc -- cache misses ate the other
> performance
> gains. (This is one of the classic arguments for using an emulator instead
> of
> native compilation.) Perhaps modern hardware would make better use of the
> native
> code.
>
> Anecdotes I've heard from people working on other big code bases indicate
> no
> clear gains, basically. But I can't claim to comprehensively cover the
> whole
> field by any means. It would be interesting to hear some testimonials.
>
> (The core erlang app used at my current company is estimated to max out the
> network interfaces without native compilation, so we haven't tried it.)
>
> One system that might have gained is Wings3D, since Hipe spent a good deal
> of
> effort on floating point optimization. Haven't measured it myself, but it
> would
> be interesting to hear of any experiences.
>
> Finally, you can always have a look at the Hipe output for a function or
> module,
> if you want to know the gory details.
>
> Best regards,
> Thomas
>
>
>
>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>