[erlang-bugs] SEGV in process_main() line 3163 [r15B03]
Mikael Pettersson
mikpelinux@REDACTED
Mon Dec 15 13:16:46 CET 2014
[2nd attempt to send this, my apologies if you seee this twice]
We've had two segfaults now in r15's process_main(), line 3163, which is
the register flushing loop just before the current process is swapped out:
==snip==
argp = c_p->arg_reg;
for (i = c_p->arity - 1; i > 0; i--) {
=> argp[i] = reg[i];
}
c_p->arg_reg[0] = r(0);
SWAPOUT;
==snip==
The core file is unfortunately truncated: I can see the registers at the
point of the SEGV, but not inspect any memory. The registers and
disassembly are:
==snip==
Program terminated with signal 11, Segmentation fault.
#0 process_main () at beam/beam_emu.c:3163
3163 beam/beam_emu.c: No such file or directory.
(gdb) info reg
rax 0x7e7d77fff3f8 139077349274616
rbx 0x7f243b82feb8 139793593990840
rcx 0x0 0
rdx 0x53ba78 5487224
rsi 0x7e7d75622030 139077305376816
rdi 0x0 0
rbp 0x1414400 0x1414400
rsp 0x7f2467432cf0 0x7f2467432cf0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x246 582
r12 0x7f2471b407c8 139794503174088
r13 0x7e7f4309cae0 139085050661600
r14 0x7e7f42e57168 139085048279400
r15 0xc63f 50751
rip 0x5425e4 0x5425e4 <process_main+29892>
eflags 0x10202 [ IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) disassemble 0x5425a6,0x542610
Dump of assembler code from 0x5425a6 to 0x542610:
0x00000000005425a6 <process_main+29830>: mov 0x90(%rbp),%rdx
0x00000000005425ad <process_main+29837>: mov %rax,0x98(%rbp)
0x00000000005425b4 <process_main+29844>: mov %edx,0xa0(%rbp)
0x00000000005425ba <process_main+29850>: mov 0xd0(%rbp),%rcx
0x00000000005425c1 <process_main+29857>: lea -0x1(%rdx),%eax
0x00000000005425c4 <process_main+29860>: mov 0x98(%rbp),%rsi
0x00000000005425cb <process_main+29867>: test %eax,%eax
0x00000000005425cd <process_main+29869>: mov %rcx,0x48(%rsp)
0x00000000005425d2 <process_main+29874>: jle 0x5425fd <process_main+29917>
0x00000000005425d4 <process_main+29876>: cltq
0x00000000005425d6 <process_main+29878>: sub $0x2,%edx
0x00000000005425d9 <process_main+29881>: shl $0x3,%rax
0x00000000005425dd <process_main+29885>: add %rax,%r12
0x00000000005425e0 <process_main+29888>: lea (%rsi,%rax,1),%rax
=> 0x00000000005425e4 <process_main+29892>: mov (%r12),%rcx
0x00000000005425e8 <process_main+29896>: sub $0x1,%edx
0x00000000005425eb <process_main+29899>: sub $0x8,%r12
0x00000000005425ef <process_main+29903>: mov %rcx,(%rax)
0x00000000005425f2 <process_main+29906>: lea 0x1(%rdx),%ecx
0x00000000005425f5 <process_main+29909>: sub $0x8,%rax
0x00000000005425f9 <process_main+29913>: test %ecx,%ecx
0x00000000005425fb <process_main+29915>: jg 0x5425e4 <process_main+29892>
0x00000000005425fd <process_main+29917>: mov %r15,(%rsi)
0x0000000000542600 <process_main+29920>: mov %r14,0x0(%rbp)
0x0000000000542604 <process_main+29924>: mov $0x8,%esi
0x0000000000542609 <process_main+29929>: mov %r13,0x8(%rbp)
0x000000000054260d <process_main+29933>: mov %rbx,0xe0(%rbp)
End of assembler dump.
==snip==
I interpret this as follows:
1. c_p == %rbp == 0x1414400
2. &argp[i] == %rax == 0x7e7d77fff3f8
from this I deduce that c_p->arg_reg != c_p->def_arg_reg, so it points
to a dynamically allocated area separate from *c_p
3. i == c_p->arity - 1 == %rdx == 0x53ba78
this is clearly bonkers, and what's causing references into unmapped
memory
4. ®[i] == %r12 == 0x7f2471b407c8
this is consistent with indexing a frame-local array at 0x53ba78
Basically, my conclusion is that c_p->arity has been clobbered, causing
out-of-range accesses in this loop.
We've had this exact crash twice now, in August and last Thursday (Dec 11).
I realize the lack of a complete core dump makes this impossible to debug.
What I'm hoping for is that someone might recollect some post-R15 change
or fix that might have something to do with unexpected clobbers of process
structs.
/Mikael
More information about the erlang-bugs
mailing list