[erlang-bugs] Infinite loop in async_del in erl_async.c

Anders.Ramsell <>
Mon May 16 18:51:45 CEST 2011


Hi!

We have recently experienced a problem where our linked in driver
using the asynchronous thread pool suddenly enters an infinite
loop eventually causing the whole Erlang runtime system to shut
down (without generating an erl_crash.dump).

This problem occured on Windows 2000/2003 Server with SMP support
disabled and with 1024 asynchronous threads (+A 1024) running
Erlang/OTP R12B-2.

After a lot of investigation the error appeared to be in the
function async_del in erl_async.c and now one of my colleagues
believes he has identified the bug.

The problem is that the code does not advance passed the first
element in a non-empty queue on a thread. If the first queue
element found is not the id we are looking for we get an infinite
loop. The code included below includes a suggested fix by my
colleague which solves the problem. 

This bug is still present in R14.


static int async_del(long id)
{
    int i;
    /* scan all queue for an entry with async_id == 'id' */

    for (i = 0; i < erts_async_max_threads; i++) {
	ErlAsync* a;
	erts_mtx_lock(&async_q[i].mtx);
	
	a = async_q[i].head;
	while(a != NULL) {
	    if (a->async_id == id) {
		if (a->prev != NULL)
		    a->prev->next = a->next;
		else
		    async_q[i].head = a->next;
		if (a->next != NULL)
		    a->next->prev = a->prev;
		else
		    async_q[i].tail = a->prev;
		async_q[i].len--;
		erts_mtx_unlock(&async_q[i].mtx);
		if (a->async_free != NULL)
		    a->async_free(a->async_data);
		async_detach(a->hndl);
		erts_free(ERTS_ALC_T_ASYNC, a);
		return 1;
	    }
	    a = a->next;         //<-- Add this line.
	}
	erts_mtx_unlock(&async_q[i].mtx);
    }
    return 0;
}


/Best regards
Anders Ramsell


More information about the erlang-bugs mailing list