Tue Aug 15 08:49:41 CEST 2006
On Mon, 14 Aug 2006 15:23:43 +0200 (CEST), Daniel Luna wrote:
>In theory it is rather simple to get sse2 for the 32-bit x86 though:
>1. The code to generate sse2 is already in the 32-bit x86 backend. (This
>can probably be used out-of-the-box for Intel Mac)
>2. One needs to change the exception stack handling, which is kind of
>easy, except that the gcc-headers didn't have the SSE2 fields when I
>looked at it the last time. Either one hard codes the offsets by hand, or
>hopes that gcc have changed the header stuff. Easy, but I was lazy. (This
>has a lot to do with exception stack handling, and I have no idea how it
>looks like for the Intel Mac)
>3. Change skip_sse2_insn in erts/emulator/sys/unix/sys_float.c to handle
>32 bit instructions instead of 64 bit instructions. In particular, the
>meanings of 0x40 and 0x41 have changed, and maybe some other instruction
>codes that I don't remember right now. (Easy, but needs to be done with
>4. The tricky part! Find some way to detect (at runtime, or at least load
>time) that the machine doesn't handle SSE2 and fallback to BEAM. Easy, but
>probably more expensive than the potential gain. The downside of backwards
>compatibility. If core dumps are ok for an old machine, this step can be
>skipped. (A no-issue for the Mac Intel)
>5. Replace "[x87 | Common]" with "Common" in hipe.erl.
>That should be it!
>Joel: I hope that helped. How about fixing sse2 for Linux x86 while you
>are at it?
I don't think this approach is either necessary or very desirable.
First, HiPE and BEAM are largely independent with regard to how
they perform floating-point computations. For instance, HiPE can
use x87 while BEAM is using SSE2. The only requirement is that the
runtime system's fp exception code is capable of handling both
instruction sets. (And it is on x86-64.) So HiPE needn't switch to
using SSE2 just because the C compiler used to compile BEAM did so.
Second, since SSE2 isn't universally available on x86-32, switching
to it on x86-32 would require runtime tests or compile-time options,
either of which would complicate the system.
Third, the main performance gain comes from inlining fp operations
at all (and having working fp exceptions). For the small fp blocks
typically seen in Erlang code, the incremental gain going from x87
to SSE2 would be small to negligible.
So in my opinion only item (3) in Daniel's list needs to be done.
More information about the erlang-questions