[erlang-questions] Excessive futex overhead on driver_async linked-in driver
Paul Fisher
pfisher@REDACTED
Tue Apr 8 04:37:00 CEST 2008
I'm working with linked-in drivers which perform low-level text
processing on files (each ~600k in size) returning a small list of
tuples via driver_output_term(). I'm seeing what seems like an excessive
amount of futex system call overhead when async threads are started with
the emulator versus starting no async threads. Ultimately work that
should scale well as async does not because of the futex overhead.
The workload is basically 96 processes, each submitting a batch of 34
file processing port commands submitted all in one shot, and then
receiving the list of tuples sent back for each file as the processing
completes in the driver. Each file is processed as a separate a port
command, which is processed via driver_async() with only term formatting
and memory deallocation occurring in the ready_async() callback.
Only 2 x Schedulers processes are run at a time, with each process with
its own port instance.
Can someone familiar with the runtime, give a description of the
synchronization strategy that is used with SMP runtime with port drivers
performing work via driver_async()? Since typical futexes supporting
mutexes only trap to a system call when there is contention, I am
assuming that there is something else going on here and guidance would
be appreciated.
Dual core 1.8 Ghz 32-bit laptop system:
$ uname -a
Linux pfisher-laptop 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC
2008 i686 GNU/Linux
$ strace -f -o /tmp/strace.s2.a0.out -c erl +S 2 -sname <...>
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
35.34 4.950806 529 9367 586 futex
19.88 2.784180 3175 877 205 read
19.82 2.776174 277617 10 1 select
19.42 2.720171 680043 4 waitpid
3.97 0.556457 3 215328 pread64
1.45 0.202802 165 1227 poll
$ strace -f -o /tmp/strace.s2.a2.out -c erl +S 2 +A 2 -sname <...>
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
30.99 6.870370 107 64256 5354 futex
15.06 3.339543 509 6561 3090 read
15.01 3.328208 332821 10 1 select
14.20 3.148197 787049 4 waitpid
12.79 2.835251 696 4073 1 poll
11.36 2.518787 10 256461 pread64
Quad core 2.4 Ghz 64-bit system:
$ uname -a
Linux cluster-14 2.6.18-6-amd64 #1 SMP Sun Feb 10 17:50:19 UTC 2008
x86_64 GNU/Linux
$ strace -f -o /tmp/strace.s4.a4.out -c erl +S 4 +A 4 -sname <...>
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
39.90 23.980296 107 223247 33938 futex
12.85 7.721196 2316 3334 1346 read
12.84 7.720482 142972 54 1 select
12.62 7.588478 758848 10 3 wait4
11.38 6.841975 2515 2721 1 poll
8.12 4.882636 436 11201 3084 open
1.88 1.129441 2 510336 pread
--
paul
More information about the erlang-questions
mailing list