[erlang-bugs] SEGV in process_main() line 3163 [r15B03]

Mikael Pettersson mikpelinux@REDACTED
Mon Dec 15 13:16:46 CET 2014


[2nd attempt to send this, my apologies if you seee this twice]

We've had two segfaults now in r15's process_main(), line 3163, which is
the register flushing loop just before the current process is swapped out:

==snip==
     argp = c_p->arg_reg;
     for (i = c_p->arity - 1; i > 0; i--) {
=>       argp[i] = reg[i];
     }
     c_p->arg_reg[0] = r(0);
     SWAPOUT;
==snip==

The core file is unfortunately truncated: I can see the registers at the
point of the SEGV, but not inspect any memory.  The registers and
disassembly are:

==snip==
Program terminated with signal 11, Segmentation fault.
#0  process_main () at beam/beam_emu.c:3163
3163    beam/beam_emu.c: No such file or directory.
(gdb) info reg
rax            0x7e7d77fff3f8   139077349274616
rbx            0x7f243b82feb8   139793593990840
rcx            0x0      0
rdx            0x53ba78 5487224
rsi            0x7e7d75622030   139077305376816
rdi            0x0      0
rbp            0x1414400        0x1414400
rsp            0x7f2467432cf0   0x7f2467432cf0
r8             0x0      0
r9             0x0      0
r10            0x0      0
r11            0x246    582
r12            0x7f2471b407c8   139794503174088
r13            0x7e7f4309cae0   139085050661600
r14            0x7e7f42e57168   139085048279400
r15            0xc63f   50751
rip            0x5425e4 0x5425e4 <process_main+29892>
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) disassemble 0x5425a6,0x542610
Dump of assembler code from 0x5425a6 to 0x542610:
   0x00000000005425a6 <process_main+29830>:     mov    0x90(%rbp),%rdx
   0x00000000005425ad <process_main+29837>:     mov    %rax,0x98(%rbp)
   0x00000000005425b4 <process_main+29844>:     mov    %edx,0xa0(%rbp)
   0x00000000005425ba <process_main+29850>:     mov    0xd0(%rbp),%rcx
   0x00000000005425c1 <process_main+29857>:     lea    -0x1(%rdx),%eax
   0x00000000005425c4 <process_main+29860>:     mov    0x98(%rbp),%rsi
   0x00000000005425cb <process_main+29867>:     test   %eax,%eax
   0x00000000005425cd <process_main+29869>:     mov    %rcx,0x48(%rsp)
   0x00000000005425d2 <process_main+29874>:     jle    0x5425fd <process_main+29917>
   0x00000000005425d4 <process_main+29876>:     cltq
   0x00000000005425d6 <process_main+29878>:     sub    $0x2,%edx
   0x00000000005425d9 <process_main+29881>:     shl    $0x3,%rax
   0x00000000005425dd <process_main+29885>:     add    %rax,%r12
   0x00000000005425e0 <process_main+29888>:     lea    (%rsi,%rax,1),%rax
=> 0x00000000005425e4 <process_main+29892>:     mov    (%r12),%rcx
   0x00000000005425e8 <process_main+29896>:     sub    $0x1,%edx
   0x00000000005425eb <process_main+29899>:     sub    $0x8,%r12
   0x00000000005425ef <process_main+29903>:     mov    %rcx,(%rax)
   0x00000000005425f2 <process_main+29906>:     lea    0x1(%rdx),%ecx
   0x00000000005425f5 <process_main+29909>:     sub    $0x8,%rax
   0x00000000005425f9 <process_main+29913>:     test   %ecx,%ecx
   0x00000000005425fb <process_main+29915>:     jg     0x5425e4 <process_main+29892>
   0x00000000005425fd <process_main+29917>:     mov    %r15,(%rsi)
   0x0000000000542600 <process_main+29920>:     mov    %r14,0x0(%rbp)
   0x0000000000542604 <process_main+29924>:     mov    $0x8,%esi
   0x0000000000542609 <process_main+29929>:     mov    %r13,0x8(%rbp)
   0x000000000054260d <process_main+29933>:     mov    %rbx,0xe0(%rbp)
End of assembler dump.
==snip==

I interpret this as follows:
1. c_p == %rbp == 0x1414400
2. &argp[i] == %rax == 0x7e7d77fff3f8
   from this I deduce that c_p->arg_reg != c_p->def_arg_reg, so it points
   to a dynamically allocated area separate from *c_p
3. i == c_p->arity - 1 == %rdx == 0x53ba78
   this is clearly bonkers, and what's causing references into unmapped
   memory
4. &reg[i] == %r12 == 0x7f2471b407c8
   this is consistent with indexing a frame-local array at 0x53ba78

Basically, my conclusion is that c_p->arity has been clobbered, causing
out-of-range accesses in this loop.

We've had this exact crash twice now, in August and last Thursday (Dec 11).

I realize the lack of a complete core dump makes this impossible to debug.
What I'm hoping for is that someone might recollect some post-R15 change
or fix that might have something to do with unexpected clobbers of process
structs.

/Mikael



More information about the erlang-bugs mailing list