[erlang-bugs] r15b03-1 SEGV in erts_port_task_schedule()

Fri Aug 8 00:44:37 CEST 2014

On Tue, Jul 29, 2014 at 4:30 PM, Mikael Pettersson <mikpelinux@REDACTED> wrote:
> Mikael Pettersson writes:
>  > This is a followup to my previous report in
>  > <http://erlang.org/pipermail/erlang-bugs/2014-June/004451.html>,
>  > but it's for a different function in erl_port_task.c.
>  >
>  > We've gotten a new SEGV with r15b03-1.  This time we managed to
>  > capture a truncated core dump (just threads list and registers,
>  > no thread stacks or heap memory):
>  >
>  > Program terminated with signal 11, Segmentation fault.
>  > #0  enqueue_task (ptp=<optimized out>,
>  >     ptqp=<error reading variable: Cannot access memory at address 0x7f8f02a95d08>)
>  >     at beam/erl_port_task.c:327
>  > 327         ptp->prev = ptqp->last;
>  > (gdb) bt
>  > #0  enqueue_task (ptp=<optimized out>,
>  >     ptqp=<error reading variable: Cannot access memory at address 0x7f8f02a95d08>)
>  >     at beam/erl_port_task.c:327
>  > #1  erts_port_task_schedule (id=<optimized out>,
>  >     id@REDACTED=<error reading variable: Cannot access memory at address 0x7f8efdeb8318>,
>  >     pthp=<error reading variable: Cannot access memory at address 0x7f8efdeb82c0>,
>  >     type=<error reading variable: Cannot access memory at address 0x7f8efdeb82cc>,
>  >     event=<error reading variable: Cannot access memory at address 0x7f8efdeb82d0>,
>  >     event_data=<error reading variable: Cannot access memory at address 0x7f8efdeb82d8>)
>  >     at beam/erl_port_task.c:615
>  > (gdb)
>  >
>  > The code that faulted is
>  >
>  >    0x00000000004b8203 <+419>:   mov    0x10(%r15),%rax
>  >    0x00000000004b8207 <+423>:   mov    0x10(%rsp),%rbx
>  >    0x00000000004b820c <+428>:   movq   $0x0,0x8(%rbx)
>  > => 0x00000000004b8214 <+436>:   mov    0x8(%rax),%rcx
>  >    0x00000000004b8218 <+440>:   mov    %rax,0x10(%rbx)
>  >    0x00000000004b821c <+444>:   mov    %rcx,(%rbx)
>  >
>  > which is enqueue_task() [line 327] as inlined in erts_port_task_schedule()
>  > [line 615].  At this point, %rax is zero according to gdb's registers dump.
>  >
>  > The relevant part of erts_port_task_schedule() is:
>  >
>  > ==snip==
>  >     if (!pp->sched.taskq)
>  >      pp->sched.taskq = port_taskq_init(port_taskq_alloc(), pp);
>  >
>  >     ASSERT(ptp);
>  >
>  >     ptp->type = type;
>  >     ptp->event = event;
>  >     ptp->event_data = event_data;
>  >
>  >     set_handle(ptp, pthp);
>  >
>  >     switch (type) {
>  >     case ERTS_PORT_TASK_FREE:
>  >      erl_exit(ERTS_ABORT_EXIT,
>  >               "erts_port_task_schedule(): Cannot schedule free task\n");
>  >      break;
>  >     case ERTS_PORT_TASK_INPUT:
>  >     case ERTS_PORT_TASK_OUTPUT:
>  >     case ERTS_PORT_TASK_EVENT:
>  >      erts_smp_atomic_inc_relb(&erts_port_task_outstanding_io_tasks);
>  >      /* Fall through... */
>  >     default:
>  >      enqueue_task(pp->sched.taskq, ptp);
>  >      break;
>  >     }
>  > ==snip==
>  >
>  > The SEGV implies that pp->sched.taskq is NULL at the call to enqueue_task().
>  >
>  > The erts_smp_atomic_inc_relb() and set_handle() calls do not affect *pp,
>  > and I don't see any aliasing between *ptp and *pp, so the assignments to
>  > *ptp do not affect *pp either.
>  >
>  > So for pp->sched.taskq to be NULL at the bottom it would have to be NULL
>  > after the call to port_taskq_init(), which implies that port_taskq_alloc()
>  > returned NULL.
>  >
>  > port_taskq_alloc() is generated via ERTS_SCHED_PREF_QUICK_ALLOC_IMPL;
>  > if one expands that it becomes:
>  >
>  > void erts_alloc_n_enomem(ErtsAlcType_t,Uint)
>  >      __attribute__((noreturn));
>  >
>  > static __inline__
>  > void *erts_alloc(ErtsAlcType_t type, Uint size)
>  > {
>  >     void *res;
>  >     res = (*erts_allctrs[(((type) >> (0)) & (15))].alloc)(
>  >      (((type) >> (7)) & (255)),
>  >      erts_allctrs[(((type) >> (0)) & (15))].extra,
>  >      size);
>  >     if (!res)
>  >      erts_alloc_n_enomem((((type) >> (7)) & (255)), size);
>  >     return res;
>  > }
>  >
>  > static __inline__ ErtsPortTaskQueue * port_taskq_alloc(void)
>  > {
>  >     ErtsPortTaskQueue *res = port_taskq_pre_alloc();
>  >     if (!res)
>  >      res = erts_alloc((4564), sizeof(ErtsPortTaskQueue));
>  >     return res;
>  > }
>  >
>  > But given this code, I don't see how erts_alloc() or port_taskq_alloc()
>  > could ever return NULL.
>  >
>  > Which leads me to suspect that there's a concurrency bug that's
>  > causing *pp to be clobbered behind our backs.
>  >
>  > Ideas?
>

Thanks for the excellent bug-report! I've found a concurrency bug (as
you suspected) that is likely to have caused the crash you got.

The fix can be found in the rickard/port-emigrate-bug/OTP-12084 branch
in my github repo
<https://github.com/rickard-green/otp/tree/rickard/port-emigrate-bug/OTP-12084>.
The fix is based on the OTP_R15B03-1 tag. I've only briefly tested the
fix, but will test it more thoroughly. If further changes are needed
I'll post here again.

During the call to erts_check_emigration_need() from
erts_port_task_schedule() we may unlock the lock on the current
run-queue and then nothing prevents another thread from migrating the
port. The bug is that we did not check if a migration had been made
after the call, but instead assumed that it either stayed on current
run-queue or that we should emigrate it. This way the other thread
could modify the pp->sched.taskq field under another run-queue lock at
the same time as the thread calling erts_port_task_schedule() was also
modifying it.

This functionality was rewritten in erts-5.10 (R16), and this bug was
automatically fixed by the rewrite, so R16 and later are not effected
by this bug.

The answers to your questions below apply to R15. Due to the rewrite
of this functionality some parts differ as of R16.

> I've studied erl_port_task.c and some related code, and think I
> understand how the locking is _supposed_ to work: Before you can
> access port->sched you have to erts_smp_runq_lock() the runq
> stored (as an atomic integer) in port->run_queue.
>
> runq = erts_port_runq(port);
>
> does this automatically, but you then end up with two variables
> that must stay coherent until you unlock.
>

Yes and this is what failed in erts_port_task_schedule().

> Q1. I see a number of places in erl_port_task.c where the code
> temporarily releases the runq lock, calls out somewhere, and then
> just locks the runq while still holding on to the original port.
> What, if anything, guarantees that these two items still belong
> together.  Can't the port migrate to another runq while the runq
> was unlocked?

This depends on the current state of the port.

A port can only migrate from one run-queue to another in the following
situations:

1. When a port (not port-task) is being scheduled, i.e. a port is
enqueued into a run-queue. (emigrate path)
2. When a port is being rescheduled after execution of port-tasks.
This is more or less the same as above. (emigrate path)
3. A port can be stolen by other run-queues while it is queued on a
run-queue. (immigrate path and work-stealing)

That is, it either needs to be in a run-queue, or needs to be about to
be put in a run-queue in order for it to be migrated. While it is
executing tasks, it cannot be in a run-queue nor be enqueued into a
run-queue, and can therefor not be migrated.

The state changes from "waiting for tasks", "runnable" (in run-queue),
and "executing" are made atomical by locking the run-queue lock of the
run-queue that the port currently is assigned to (pp->run_queue).

When migrating a port, one needs to lock both involved run-queue locks
at the same time.

> If so, then it seems we have to reload runq (if the
> port is the object of interest) or revalidate and possibly bail out
> if they are now no longer connected.

Yes, unless we know that the port is in a state where it cannot be
migrated. The bug was however a missed revalidation.

The unlock/lock operations of the run-queue without reloading the
run-queue from pp->run_queue made in erts_port_task_execute() are
however safe. This since the port is in an executing state and
therefor cannot be migrated.

The scheduler thread that pops the port from its run-queue for
execution set pp->sched.exe_taskq before unlocking the run-queue lock.
Other threads detects that the port is executing since the
pp->sched.exe_taskq field is set. Since it is not in any run-queue and
flagged as executing (by pp->sched.exe_taskq), it cannot be migrated.
That is, it is safe to unlock/lock the same run-queue without
reloading the assigned run-queue until pp->sched.exe_taskq is cleared.

>
> Q2. Is erts_check_emigration_need(runq) _guaranteed_ to return a
> value that is different from the given runq?

Yes.

> Looking at the code,
> that's not obviously true.

:-)

erts_check_emigration_need() reads the emigration path set up by the
check_balance() function (in erl_process.c). If there exist an
emigration path on a run-queue, it should always point to another
run-queue. I've taken a look at the code (end of check_balance()
function) and I'm quite confident that it is ok.

> If it ever returned the same runq it
> was given, very bad things(TM) would happen...
>

Yes.

> /Mikael

Regards,
Rickard Green, Erlang/OTP, Ericsson AB