[erlang-bugs] r15b03-1 SEGV in erts_port_task_schedule()
Mikael Pettersson
mikpelinux@REDACTED
Mon Jul 28 16:27:23 CEST 2014
This is a followup to my previous report in
<http://erlang.org/pipermail/erlang-bugs/2014-June/004451.html>,
but it's for a different function in erl_port_task.c.
We've gotten a new SEGV with r15b03-1. This time we managed to
capture a truncated core dump (just threads list and registers,
no thread stacks or heap memory):
Program terminated with signal 11, Segmentation fault.
#0 enqueue_task (ptp=<optimized out>,
ptqp=<error reading variable: Cannot access memory at address 0x7f8f02a95d08>)
at beam/erl_port_task.c:327
327 ptp->prev = ptqp->last;
(gdb) bt
#0 enqueue_task (ptp=<optimized out>,
ptqp=<error reading variable: Cannot access memory at address 0x7f8f02a95d08>)
at beam/erl_port_task.c:327
#1 erts_port_task_schedule (id=<optimized out>,
id@REDACTED=<error reading variable: Cannot access memory at address 0x7f8efdeb8318>,
pthp=<error reading variable: Cannot access memory at address 0x7f8efdeb82c0>,
type=<error reading variable: Cannot access memory at address 0x7f8efdeb82cc>,
event=<error reading variable: Cannot access memory at address 0x7f8efdeb82d0>,
event_data=<error reading variable: Cannot access memory at address 0x7f8efdeb82d8>)
at beam/erl_port_task.c:615
(gdb)
The code that faulted is
0x00000000004b8203 <+419>: mov 0x10(%r15),%rax
0x00000000004b8207 <+423>: mov 0x10(%rsp),%rbx
0x00000000004b820c <+428>: movq $0x0,0x8(%rbx)
=> 0x00000000004b8214 <+436>: mov 0x8(%rax),%rcx
0x00000000004b8218 <+440>: mov %rax,0x10(%rbx)
0x00000000004b821c <+444>: mov %rcx,(%rbx)
which is enqueue_task() [line 327] as inlined in erts_port_task_schedule()
[line 615]. At this point, %rax is zero according to gdb's registers dump.
The relevant part of erts_port_task_schedule() is:
==snip==
if (!pp->sched.taskq)
pp->sched.taskq = port_taskq_init(port_taskq_alloc(), pp);
ASSERT(ptp);
ptp->type = type;
ptp->event = event;
ptp->event_data = event_data;
set_handle(ptp, pthp);
switch (type) {
case ERTS_PORT_TASK_FREE:
erl_exit(ERTS_ABORT_EXIT,
"erts_port_task_schedule(): Cannot schedule free task\n");
break;
case ERTS_PORT_TASK_INPUT:
case ERTS_PORT_TASK_OUTPUT:
case ERTS_PORT_TASK_EVENT:
erts_smp_atomic_inc_relb(&erts_port_task_outstanding_io_tasks);
/* Fall through... */
default:
enqueue_task(pp->sched.taskq, ptp);
break;
}
==snip==
The SEGV implies that pp->sched.taskq is NULL at the call to enqueue_task().
The erts_smp_atomic_inc_relb() and set_handle() calls do not affect *pp,
and I don't see any aliasing between *ptp and *pp, so the assignments to
*ptp do not affect *pp either.
So for pp->sched.taskq to be NULL at the bottom it would have to be NULL
after the call to port_taskq_init(), which implies that port_taskq_alloc()
returned NULL.
port_taskq_alloc() is generated via ERTS_SCHED_PREF_QUICK_ALLOC_IMPL;
if one expands that it becomes:
void erts_alloc_n_enomem(ErtsAlcType_t,Uint)
__attribute__((noreturn));
static __inline__
void *erts_alloc(ErtsAlcType_t type, Uint size)
{
void *res;
res = (*erts_allctrs[(((type) >> (0)) & (15))].alloc)(
(((type) >> (7)) & (255)),
erts_allctrs[(((type) >> (0)) & (15))].extra,
size);
if (!res)
erts_alloc_n_enomem((((type) >> (7)) & (255)), size);
return res;
}
static __inline__ ErtsPortTaskQueue * port_taskq_alloc(void)
{
ErtsPortTaskQueue *res = port_taskq_pre_alloc();
if (!res)
res = erts_alloc((4564), sizeof(ErtsPortTaskQueue));
return res;
}
But given this code, I don't see how erts_alloc() or port_taskq_alloc()
could ever return NULL.
Which leads me to suspect that there's a concurrency bug that's
causing *pp to be clobbered behind our backs.
Ideas?
/Mikael
More information about the erlang-bugs
mailing list