Victory? (was Re: Mac Intel)

Tue Aug 15 13:20:20 CEST 2006

On Tue, 15 Aug 2006 11:30:26 +0100, Joel Reymont wrote:
> It's crashing on this apparently.
> 
> 0x000a4702 <hipe_bifs_get_hrvtime_0+74>:        movapd %xmm0,-56(%ebp)
> 
> According to the movapd description:
> 
> 66 0F 28 /r MOVAPD xmm1, xmm2/m128
> Move packed double-precision floating-point values from xmm2/m128 to  
> xmm1.
> 
> 66 0F 29 /r MOVAPD xmm2/m128, xmm1
> Move packed double-precision floating-point values from xmm1 to xmm2/ 
> m128.
> 
> So unless I'm mistaken, movapd shouldn't have been used with -56(% 
> ebp), correct?

Incorrect. xmm2/m128 means xmm register or 128-bit memory location.
These instructions move values between registers or between
registers and memory.

> Is there something wrong with HiPE code generation?

This is in the runtime system, compiled from C and assembly code.

> Program received signal EXC_BAD_INSTRUCTION, Illegal instruction/ 
> operand.
> 0x000a4702 in hipe_bifs_get_hrvtime_0 (A__p=0x12dbc94) at hipe/ 
> hipe_bif1.c:893
> 893         f.fd = get_hrvtime();
> (gdb) disas
> Dump of assembler code for function hipe_bifs_get_hrvtime_0:
> 0x000a46b8 <hipe_bifs_get_hrvtime_0+0>: push   %ebp
> 0x000a46b9 <hipe_bifs_get_hrvtime_0+1>: mov    %esp,%ebp
> 0x000a46bb <hipe_bifs_get_hrvtime_0+3>: push   %ebx
> 0x000a46bc <hipe_bifs_get_hrvtime_0+4>: sub    $0x44,%esp
> 0x000a46bf <hipe_bifs_get_hrvtime_0+7>: mov    8(%ebp),%ebx
> 0x000a46c2 <hipe_bifs_get_hrvtime_0+10>:        movl   $0x0,12(%esp)
> 0x000a46ca <hipe_bifs_get_hrvtime_0+18>:        movl   $0x0,8(%esp)
> 0x000a46d2 <hipe_bifs_get_hrvtime_0+26>:        movl   $0x0,4(%esp)
> 0x000a46da <hipe_bifs_get_hrvtime_0+34>:        lea    -12(%ebp),%eax
> 0x000a46dd <hipe_bifs_get_hrvtime_0+37>:        mov    %eax,(%esp)
> 0x000a46e0 <hipe_bifs_get_hrvtime_0+40>:        call   0x45669  
> <elapsed_time_both>
> 0x000a46e5 <hipe_bifs_get_hrvtime_0+45>:        movd   -12(%ebp),%xmm1
> 0x000a46ea <hipe_bifs_get_hrvtime_0+50>:        pxor   %xmm0,%xmm0
> 0x000a46ee <hipe_bifs_get_hrvtime_0+54>:        punpckldq %xmm1,%xmm0
> 0x000a46f2 <hipe_bifs_get_hrvtime_0+58>:        punpckldq 903040,%xmm0
> 0x000a46fa <hipe_bifs_get_hrvtime_0+66>:        subpd  903056,%xmm0
> 0x000a4702 <hipe_bifs_get_hrvtime_0+74>:        movapd %xmm0,-56(%ebp)

However, modapd will fail with a general protection fault if the
memory operand isn't 16-byte aligned.

Here's my theory: since this is x86-32, and x86-32 hasn't supported
SSE2 until now, we haven't even attempted to make the C runtime stack
16-byte aligned. On x86-64 we do this in hipe_amd64_glue.S. The BIF
wrappers in hipe_amd64_bifs.m4 haven't had to care because on x86-64
we pass all BIF parameters in registers, so calling a BIF doesn't
affect the C stack's alignment.

On x86-32 things are more difficult. It does not suffice to 16-byte
align the C stack in hipe_x86_glue.S, because the BIF wrappers will
push varying number of parameters on it, causing it to become
misaligned. It's difficult to know in advance if a BIF will require
16-byte alignment or not, so we need a general solution. One solution
could be to change every BIF wrapper to start by adjusting ESP so that
the final ESP after pushing the parameters is 16-byte aligned. This
would however add one instruction to most BIF wrappers. Another solution
is to change hipe_x86_glue.S to create an aligned C stack frame with a
suitably large parameter area preallocated at the bottom of the frame.
Then the BIF wrappers can be rewritten to MOVE the parameters to
the bottom of the frame instead of PUSHing the parameters, and
to omit the ESP adjustment to POP the parameters after the call.
This approach also has the advantage of allowing more instruction-level
parallelism, is a recommended practice in AMD's optimisation manual.
All in all I think this is the best solution.

For now, just hack hipe_bifs_get_hrvtime_0 to return make_small(0)
instead of calling get_hrvtime().