hipe segmentation fault

Wed Apr 5 19:54:14 CEST 2006

Mikael,

Here's a bunch of info you requested.  Let's continue this discussion 
outside of the mailing list's scope, and just post the resolution 
when/if it's available.

Regards,

Serge

(gdb) where
#0  0x08aae424 in ?? ()
#1  0x080f0f63 in x86_call_to_native () at hipe/hipe_x86_glue.S:42
#2  0x00000000 in ?? ()

Current instruction:
(gdb) p/x $pc
$2 = 0x8aae424

Next instruction:
(gdb) x/i $pc
0x8aae424:      lea    0xffffff78(%esp),%ebx

(gdb) list
42              NSP_CALL(*P_NCALLEE(P))
43      /*
44       * We export this return address so that hipe_mode_switch() can 
discover
45       * when native code tailcalls emulated code.
46       *
47       * This is where native code returns to emulated code.
48       */
49      nbif_return:
50              movl    %eax, P_ARG0(P)                 # save retval
51              movl    $HIPE_MODE_SWITCH_RES_RETURN, %eax

(gdb) disas 0x08aae424
No function contains specified address.

(gdb) disas 0x080f0f63
Dump of assembler code for function nbif_return:
0x080f0f63 <nbif_return+0>:     mov    %eax,0x4c(%ebp)
0x080f0f66 <nbif_return+3>:     mov    $0x5,%eax
End of assembler dump.

(gdb) info registers
eax            0x4f     79
ecx            0xb7d568e9       -1210750743
edx            0x578b   22411
ebx            0xb7eca674       -1209227660
esp            0x8ab059c        0x8ab059c
ebp            0xb7eca674       0xb7eca674
esi            0xb7d52980       -1210766976
edi            0x18     24
eip            0x8aae424        0x8aae424
eflags         0x10286  66182
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

(gdb) print *(Process*)$ebp
$1 = {htop = 0xb7d52980, stop = 0xb7d52a30, heap = 0xb7d52108, hend = 
0xb7d52a90, heap_sz = 610, min_heap_size = 233, hipe = {
     nsp = 0x8ab05a0, nstack = 0x8ab03a0, nstend = 0x8ab05a0, ncallee = 
0x8aae424, closure = 11, nstgraylim = 0x0,
     nstblacklim = 0x0, ngra = 0, ncsp = 0xbffff46c, narity = 0}, arity 
= 0, arg_reg = 0xb7eca6c0, max_arg_reg = 6, def_arg_reg = {
     79, 22411, 3084216553, 135076450, 0, 1000}, cp = 0x81b18b8, i = 
0x0, catches = 0, fcalls = 600, status = 3, rstatus = 0,
   rcount = 0, id = 499, prio = 2, skipped = 0, reds = 968, 
error_handler = 6859, tracer_proc = 4294967291, group_leader = 387,
   flags = 0, fvalue = 4294967291, freason = 848, ftrace = 4294967291, 
dist_entry = 0x0, tm = {next = 0x0, slot = 0, count = 0,
     active = 0, timeout = 0, cancel = 0, arg = 0x0}, next = 0x0, reg = 
0x0, nlinks = 0xb7ecacdc, monitors = 0x0, msg = {
     first = 0x0, last = 0xb7eca750, save = 0xb7eca750, len = 0}, 
bif_timers = 0x0, dictionary = 0x0, debug_dictionary = 0x0,
   ct = 0x0, seq_trace_clock = 0, seq_trace_lastcnt = 0, seq_trace_token 
= 4294967291, initial = {6731, 22411, 2}, current = 0x0,
   parent = 403, started = 1144245940, high_water = 0xb7d52604, old_hend 
= 0xb7d513c8, old_htop = 0xb7d50aa0,
   old_heap = 0xb7d50a40, gen_gcs = 4, max_gen_gcs = 65535, off_heap = 
{mso = 0x0, funs = 0xb7d52944, externals = 0x0,
     overhead = 0}, mbuf = 0x0, mbuf_sz = 0, arith_heap = 0x0, 
arith_avail = 0}

Mikael Pettersson wrote:
> Date: Mon, 03 Apr 2006 10:30:15 -0400, Serge Aleynikov wrote:
> 
>>Ops...  Sorry, this test case was taken from Rickard Green's post on 
>>profiling P11B smp scheduling support.  I wanted to run it on multi-cpu 
>>host, and installed R10B-10 and P11B releases with and without hipe.
> 
> ...
> 
>>Erlang (BEAM) emulator version 5.4.13 [source] [hipe] [threads:0]
>>
>>Eshell V5.4.13  (abort with ^G)
>>1> c(big, [native]).
>>{ok,big}
>>2> big:bang(4).
>>Segmentation fault (core dumped)
> 
> ...
> 
>>(gdb) bt
>>#0  0x08aae41c in ?? ()
>>#1  0x080f0f63 in x86_call_to_native () at hipe/hipe_x86_glue.S:42
>>#2  0x00000000 in ?? ()
>>(gdb)
>>
>>...
>>
>>Looking at hipe_x86_glue.S:42:
>>
>>x86_call_to_native:
>>     ENTER_FROM_C
>>     /* get argument registers */
>>     LOAD_ARG_REGS
>>     /* call the target */
>>     NSP_CALL(*P_NCALLEE(P))  <-- Failing here
>>
>>I'm not sure what this call does, but maybe Mikael can give a clue.
> 
> 
> This is the entry point for BEAM calling a native-compiled function.
> NSP_CALL() currently expands to a plain "call" instruction; it's a macro so
> we can experiment with and measure other ways of performing calls and returns.
> 
> I'm unable to reproduce your problem here. The closest machine we
> have to yours is a dual HT P4 Xeon of the older 32-bit only type,
> running FC4 user-space on a custom 2.6.9-34 RHEL4 kernel, and things
> just work. Your test case also works on an Athlon64 running the same
> FC4/RHEL4 combo in pure 64-bit mode.
> 
> It would help if you could run beam from gdb (easiest is to attach to
> it, otherwise you have to set up several environment variables), and
> print the exact location of the program counter at the crash, list the
> surrounding assembler code, print the registers, and also the print the
> contents of "P" (print *p in a C frame, print *(Process*)$ebp ought to
> do the same in assembler mode).
> 
> /Mikael
> 

-- 
Serge Aleynikov
R&D Telecom, IDT Corp.
Tel: (973) 438-3436
Fax: (973) 438-1464
serge@REDACTED