[erlang-bugs] SEGV in process_main() line 3163 [r15B03]
Anthony Ramine
n.oxyde@REDACTED
Wed Dec 17 12:26:35 CET 2014
Le 15 déc. 2014 à 13:16, Mikael Pettersson <mikpelinux@REDACTED> a écrit :
> [2nd attempt to send this, my apologies if you seee this twice]
>
> We've had two segfaults now in r15's process_main(), line 3163, which is
> the register flushing loop just before the current process is swapped out:
>
> ==snip==
> argp = c_p->arg_reg;
> for (i = c_p->arity - 1; i > 0; i--) {
> => argp[i] = reg[i];
> }
> c_p->arg_reg[0] = r(0);
> SWAPOUT;
> ==snip==
>
> The core file is unfortunately truncated: I can see the registers at the
> point of the SEGV, but not inspect any memory. The registers and
> disassembly are:
>
> ==snip==
> Program terminated with signal 11, Segmentation fault.
> #0 process_main () at beam/beam_emu.c:3163
> 3163 beam/beam_emu.c: No such file or directory.
> (gdb) info reg
> rax 0x7e7d77fff3f8 139077349274616
> rbx 0x7f243b82feb8 139793593990840
> rcx 0x0 0
> rdx 0x53ba78 5487224
> rsi 0x7e7d75622030 139077305376816
> rdi 0x0 0
> rbp 0x1414400 0x1414400
> rsp 0x7f2467432cf0 0x7f2467432cf0
> r8 0x0 0
> r9 0x0 0
> r10 0x0 0
> r11 0x246 582
> r12 0x7f2471b407c8 139794503174088
> r13 0x7e7f4309cae0 139085050661600
> r14 0x7e7f42e57168 139085048279400
> r15 0xc63f 50751
> rip 0x5425e4 0x5425e4 <process_main+29892>
> eflags 0x10202 [ IF RF ]
> cs 0x33 51
> ss 0x2b 43
> ds 0x0 0
> es 0x0 0
> fs 0x0 0
> gs 0x0 0
> (gdb) disassemble 0x5425a6,0x542610
> Dump of assembler code from 0x5425a6 to 0x542610:
> 0x00000000005425a6 <process_main+29830>: mov 0x90(%rbp),%rdx
> 0x00000000005425ad <process_main+29837>: mov %rax,0x98(%rbp)
> 0x00000000005425b4 <process_main+29844>: mov %edx,0xa0(%rbp)
> 0x00000000005425ba <process_main+29850>: mov 0xd0(%rbp),%rcx
> 0x00000000005425c1 <process_main+29857>: lea -0x1(%rdx),%eax
> 0x00000000005425c4 <process_main+29860>: mov 0x98(%rbp),%rsi
> 0x00000000005425cb <process_main+29867>: test %eax,%eax
> 0x00000000005425cd <process_main+29869>: mov %rcx,0x48(%rsp)
> 0x00000000005425d2 <process_main+29874>: jle 0x5425fd <process_main+29917>
> 0x00000000005425d4 <process_main+29876>: cltq
> 0x00000000005425d6 <process_main+29878>: sub $0x2,%edx
> 0x00000000005425d9 <process_main+29881>: shl $0x3,%rax
> 0x00000000005425dd <process_main+29885>: add %rax,%r12
> 0x00000000005425e0 <process_main+29888>: lea (%rsi,%rax,1),%rax
> => 0x00000000005425e4 <process_main+29892>: mov (%r12),%rcx
> 0x00000000005425e8 <process_main+29896>: sub $0x1,%edx
> 0x00000000005425eb <process_main+29899>: sub $0x8,%r12
> 0x00000000005425ef <process_main+29903>: mov %rcx,(%rax)
> 0x00000000005425f2 <process_main+29906>: lea 0x1(%rdx),%ecx
> 0x00000000005425f5 <process_main+29909>: sub $0x8,%rax
> 0x00000000005425f9 <process_main+29913>: test %ecx,%ecx
> 0x00000000005425fb <process_main+29915>: jg 0x5425e4 <process_main+29892>
> 0x00000000005425fd <process_main+29917>: mov %r15,(%rsi)
> 0x0000000000542600 <process_main+29920>: mov %r14,0x0(%rbp)
> 0x0000000000542604 <process_main+29924>: mov $0x8,%esi
> 0x0000000000542609 <process_main+29929>: mov %r13,0x8(%rbp)
> 0x000000000054260d <process_main+29933>: mov %rbx,0xe0(%rbp)
> End of assembler dump.
> ==snip==
>
> I interpret this as follows:
> 1. c_p == %rbp == 0x1414400
> 2. &argp[i] == %rax == 0x7e7d77fff3f8
> from this I deduce that c_p->arg_reg != c_p->def_arg_reg, so it points
> to a dynamically allocated area separate from *c_p
> 3. i == c_p->arity - 1 == %rdx == 0x53ba78
> this is clearly bonkers, and what's causing references into unmapped
> memory
> 4. ®[i] == %r12 == 0x7f2471b407c8
> this is consistent with indexing a frame-local array at 0x53ba78
>
> Basically, my conclusion is that c_p->arity has been clobbered, causing
> out-of-range accesses in this loop.
>
> We've had this exact crash twice now, in August and last Thursday (Dec 11).
>
> I realize the lack of a complete core dump makes this impossible to debug.
> What I'm hoping for is that someone might recollect some post-R15 change
> or fix that might have something to do with unexpected clobbers of process
> structs.
>
> /Mikael
How do you know it's not a NIF doing strange things or whatnot? Did you manage to reproduce it afterwards? Did you try with a debug build?
Regards.
More information about the erlang-bugs
mailing list