[erlang-bugs] SEGV in process_main() line 3163 [r15B03]

Anthony Ramine n.oxyde@REDACTED
Wed Dec 17 12:26:35 CET 2014


Le 15 déc. 2014 à 13:16, Mikael Pettersson <mikpelinux@REDACTED> a écrit :

> [2nd attempt to send this, my apologies if you seee this twice]
> 
> We've had two segfaults now in r15's process_main(), line 3163, which is
> the register flushing loop just before the current process is swapped out:
> 
> ==snip==
>     argp = c_p->arg_reg;
>     for (i = c_p->arity - 1; i > 0; i--) {
> =>       argp[i] = reg[i];
>     }
>     c_p->arg_reg[0] = r(0);
>     SWAPOUT;
> ==snip==
> 
> The core file is unfortunately truncated: I can see the registers at the
> point of the SEGV, but not inspect any memory.  The registers and
> disassembly are:
> 
> ==snip==
> Program terminated with signal 11, Segmentation fault.
> #0  process_main () at beam/beam_emu.c:3163
> 3163    beam/beam_emu.c: No such file or directory.
> (gdb) info reg
> rax            0x7e7d77fff3f8   139077349274616
> rbx            0x7f243b82feb8   139793593990840
> rcx            0x0      0
> rdx            0x53ba78 5487224
> rsi            0x7e7d75622030   139077305376816
> rdi            0x0      0
> rbp            0x1414400        0x1414400
> rsp            0x7f2467432cf0   0x7f2467432cf0
> r8             0x0      0
> r9             0x0      0
> r10            0x0      0
> r11            0x246    582
> r12            0x7f2471b407c8   139794503174088
> r13            0x7e7f4309cae0   139085050661600
> r14            0x7e7f42e57168   139085048279400
> r15            0xc63f   50751
> rip            0x5425e4 0x5425e4 <process_main+29892>
> eflags         0x10202  [ IF RF ]
> cs             0x33     51
> ss             0x2b     43
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x0      0
> (gdb) disassemble 0x5425a6,0x542610
> Dump of assembler code from 0x5425a6 to 0x542610:
>   0x00000000005425a6 <process_main+29830>:     mov    0x90(%rbp),%rdx
>   0x00000000005425ad <process_main+29837>:     mov    %rax,0x98(%rbp)
>   0x00000000005425b4 <process_main+29844>:     mov    %edx,0xa0(%rbp)
>   0x00000000005425ba <process_main+29850>:     mov    0xd0(%rbp),%rcx
>   0x00000000005425c1 <process_main+29857>:     lea    -0x1(%rdx),%eax
>   0x00000000005425c4 <process_main+29860>:     mov    0x98(%rbp),%rsi
>   0x00000000005425cb <process_main+29867>:     test   %eax,%eax
>   0x00000000005425cd <process_main+29869>:     mov    %rcx,0x48(%rsp)
>   0x00000000005425d2 <process_main+29874>:     jle    0x5425fd <process_main+29917>
>   0x00000000005425d4 <process_main+29876>:     cltq
>   0x00000000005425d6 <process_main+29878>:     sub    $0x2,%edx
>   0x00000000005425d9 <process_main+29881>:     shl    $0x3,%rax
>   0x00000000005425dd <process_main+29885>:     add    %rax,%r12
>   0x00000000005425e0 <process_main+29888>:     lea    (%rsi,%rax,1),%rax
> => 0x00000000005425e4 <process_main+29892>:     mov    (%r12),%rcx
>   0x00000000005425e8 <process_main+29896>:     sub    $0x1,%edx
>   0x00000000005425eb <process_main+29899>:     sub    $0x8,%r12
>   0x00000000005425ef <process_main+29903>:     mov    %rcx,(%rax)
>   0x00000000005425f2 <process_main+29906>:     lea    0x1(%rdx),%ecx
>   0x00000000005425f5 <process_main+29909>:     sub    $0x8,%rax
>   0x00000000005425f9 <process_main+29913>:     test   %ecx,%ecx
>   0x00000000005425fb <process_main+29915>:     jg     0x5425e4 <process_main+29892>
>   0x00000000005425fd <process_main+29917>:     mov    %r15,(%rsi)
>   0x0000000000542600 <process_main+29920>:     mov    %r14,0x0(%rbp)
>   0x0000000000542604 <process_main+29924>:     mov    $0x8,%esi
>   0x0000000000542609 <process_main+29929>:     mov    %r13,0x8(%rbp)
>   0x000000000054260d <process_main+29933>:     mov    %rbx,0xe0(%rbp)
> End of assembler dump.
> ==snip==
> 
> I interpret this as follows:
> 1. c_p == %rbp == 0x1414400
> 2. &argp[i] == %rax == 0x7e7d77fff3f8
>   from this I deduce that c_p->arg_reg != c_p->def_arg_reg, so it points
>   to a dynamically allocated area separate from *c_p
> 3. i == c_p->arity - 1 == %rdx == 0x53ba78
>   this is clearly bonkers, and what's causing references into unmapped
>   memory
> 4. &reg[i] == %r12 == 0x7f2471b407c8
>   this is consistent with indexing a frame-local array at 0x53ba78
> 
> Basically, my conclusion is that c_p->arity has been clobbered, causing
> out-of-range accesses in this loop.
> 
> We've had this exact crash twice now, in August and last Thursday (Dec 11).
> 
> I realize the lack of a complete core dump makes this impossible to debug.
> What I'm hoping for is that someone might recollect some post-R15 change
> or fix that might have something to do with unexpected clobbers of process
> structs.
> 
> /Mikael

How do you know it's not a NIF doing strange things or whatnot? Did you manage to reproduce it afterwards? Did you try with a debug build?

Regards.




More information about the erlang-bugs mailing list