[erlang-questions] Excessive futex overhead on driver_async linked-in driver

Paul Fisher pfisher@REDACTED
Tue Apr 8 04:37:00 CEST 2008


I'm working with linked-in drivers which perform low-level text
processing on files (each ~600k in size) returning a small list of
tuples via driver_output_term(). I'm seeing what seems like an excessive
amount of futex system call overhead when async threads are started with
the emulator versus starting no async threads.  Ultimately work that
should scale well as async does not because of the futex overhead. 


The workload is basically 96 processes, each submitting a batch of 34
file processing port commands submitted all in one shot, and then
receiving the list of tuples sent back for each file as the processing
completes in the driver.  Each file is processed as a separate a port
command, which is processed via driver_async() with only term formatting
and memory deallocation occurring in the ready_async() callback.
Only 2 x Schedulers processes are run at a time, with each process with
its own port instance.

Can someone familiar with the runtime, give a description of the
synchronization strategy that is used with SMP runtime with port drivers
performing work via driver_async()?  Since typical futexes supporting
mutexes only trap to a system call when there is contention, I am
assuming that there is something else going on here and guidance would
be appreciated.


Dual core 1.8 Ghz 32-bit laptop system:
$ uname -a
Linux pfisher-laptop 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC
2008 i686 GNU/Linux

$ strace -f -o /tmp/strace.s2.a0.out -c erl +S 2 -sname <...>

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 35.34    4.950806         529      9367       586 futex
 19.88    2.784180        3175       877       205 read
 19.82    2.776174      277617        10         1 select
 19.42    2.720171      680043         4           waitpid
  3.97    0.556457           3    215328           pread64
  1.45    0.202802         165      1227           poll

$ strace -f -o /tmp/strace.s2.a2.out -c erl +S 2 +A 2 -sname <...>

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 30.99    6.870370         107     64256      5354 futex
 15.06    3.339543         509      6561      3090 read
 15.01    3.328208      332821        10         1 select
 14.20    3.148197      787049         4           waitpid
 12.79    2.835251         696      4073         1 poll
 11.36    2.518787          10    256461           pread64


Quad core 2.4 Ghz 64-bit system:
$ uname -a
Linux cluster-14 2.6.18-6-amd64 #1 SMP Sun Feb 10 17:50:19 UTC 2008
x86_64 GNU/Linux

$ strace -f -o /tmp/strace.s4.a4.out -c erl +S 4 +A 4 -sname <...>

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 39.90   23.980296         107    223247     33938 futex
 12.85    7.721196        2316      3334      1346 read
 12.84    7.720482      142972        54         1 select
 12.62    7.588478      758848        10         3 wait4
 11.38    6.841975        2515      2721         1 poll
  8.12    4.882636         436     11201      3084 open
  1.88    1.129441           2    510336           pread


--
paul





More information about the erlang-questions mailing list