[erlang-bugs] erts_port[].drv_ptr == 0, when erts_port[].status not free
Paul Fisher
pfisher@REDACTED
Wed Jul 2 02:24:05 CEST 2008
We have a system where we run lots of linked-in driver ports that get
created/used/closed frequently and sometimes very quickly. Today when
several open_port/2, port_command/2 and port_close/1 cycles happened
rapid succession, a SIGSEGV occurrect in erl_bif_ddl.c:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1125235040 (LWP 12087)]
0x0000000000449712 in erl_ddll_try_unload_2 (p=0x2aaaab11fc90,
name_term=659339, options=46912503328425) at beam/erl_bif_ddll.c:592
The emulator was run on a Q6600 (quad-core, 2.4Ghz), and started with +A
8,
and the linked-in driver executes the bulk of its work with
driver_async().
There were continuously 8 driver cycles running for 5-10 seconds before
the
segfault occurred.
(gdb) where
#0 0x0000000000449712 in erl_ddll_try_unload_2 (p=0x2aaaab11fc90,
name_term=659339, options=46912503328425) at beam/erl_bif_ddll.c:592
#1 0x000000000052337f in process_main () at beam/beam_emu.c:2073
#2 0x000000000049c213 in sched_thread_func (vesdp=0x2ae18cb74f98)
at beam/erl_process.c:741
#3 0x00000000005b6818 in thr_wrapper (vtwd=0x7fff1eb77de0)
at common/ethread.c:474
#4 0x00002ae18c530f1a in start_thread () from /lib/libpthread.so.0
#5 0x00002ae18c8135d2 in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()
So the code at the point of the SIGSEGV @ erl_bif_ddll.c:592 says:
for (j = 0; j < erts_max_ports; j++) {
=> if (!(erts_port[j].status & FREE_PORT_FLAGS)
&& erts_port[j].drv_ptr->handle == dh) {
It appears that the code assumes that if the erts_port array entry being
evaluated during the search has a valid (non-zero) drv_ptr value, if the
entry is not marked as free. At the time of the crash, this is clearly
not
the case:
(gdb) p j
$8 = 896
(gdb) p erts_port[j]
$7 = {sched = {next = 0x0, prev = 0x0, taskq = 0x0, exe_taskq = 0x0},
timeout_task = {counter = 0}, refc = {counter = 2}, lock = 0x81b3c8,
xports = 0x0, id = 14343, connected = 0, caller = 0, data = 0, bp =
0x0,
nlinks = 0x0, monitors = 0x0, bytes_in = 0, bytes_out = 0, ptimer =
0x0,
tracer_proc = 18446744073709551611, trace_flags = 0, ioq = {size = 0,
v_start = 0x0, v_end = 0x0, v_head = 0x0, v_tail = 0x0, v_small = {{
iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, iov_len = 0}, {
iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, iov_len = 0}, {
iov_base = 0x0, iov_len = 0}}, b_start = 0x0, b_end = 0x0,
b_head = 0x0, b_tail = 0x0, b_small = {0x0, 0x0, 0x0, 0x0, 0x0}},
dist_entry = 0x0, name = 0x0, drv_ptr = 0x0, drv_data = 0, suspended =
0x0,
linebuf = 0x0, status = 4096, control_flags = 0, reg = 0x0,
port_data_lock = 0x0}
(gdb) p erts_port[j].drv_ptr
$6 = (ErlDrvEntry *) 0x0
So the real questions are: 1) is whether the assumption built into this
code is correct; and 2) if so, how did we get in the position of
violating
it. I'd appreciate some insight into what could be going on here, and
where I should can start looking.
--
paul
More information about the erlang-bugs
mailing list