[erlang-questions] Performance analysis advice
ANTHONY MOLINARO
anthonym@REDACTED
Tue Feb 11 01:45:56 CET 2014
Hi,
I've been recently trying to find and eliminate bottlenecks in a web service. I had a few cases I knew were bottlenecks which I eliminated, and I've reached the point where I'm not sure what to look into next.
The server is basically
- webmachine
- 3 backend services fronted by dispcount
- each request is an erlang process which calls either 2 or 3 of the backend services in sequence, when a resource is unavailable the process will erlang:yield/0 and try again next time it is scheduled.
- very little is done per request (or at least fprof doesn't show any hotspots).
The current issue seems to a performance knee of some sort. As I add traffic to my test node I see CPU utilization spike at around 200 concurrent requests. It's a pretty dramatic knee, here's a sample of the data points (columns are timestamp, number of established connections 'netstat -n' and %cpu 'top -b').
2014/02/11-00:34:02 0 8.0
2014/02/11-00:34:04 0 12.0
2014/02/11-00:34:06 14 95.8
2014/02/11-00:34:08 30 153.8
2014/02/11-00:34:10 34 169.7
2014/02/11-00:34:12 64 269.6
2014/02/11-00:34:14 58 311.5
2014/02/11-00:34:16 81 385.4
2014/02/11-00:34:18 94 463.3
2014/02/11-00:34:21 114 523.4
2014/02/11-00:34:23 114 595.0
2014/02/11-00:34:25 153 670.9
2014/02/11-00:34:27 149 777.2
2014/02/11-00:34:29 165 1359.5
2014/02/11-00:34:31 198 1529.3
2014/02/11-00:34:33 350 1517.4
2014/02/11-00:34:36 369 1523.2
2014/02/11-00:34:38 355 1525.3
It's a 16 core, so 1500% is pretty much maxed out. But notice the jump between 149 connections and 198 connections, it's almost double the CPU.
I tried running the lock counting profiler during running and it looks like this
lock id #tries #collisions collisions [%] time [us] duration [%]
----- --- ------- ------------ --------------- ---------- -------------
pollset 1 2369355 71578 3.0210 553249 0.7896
proc_main 154649 6315238 13325 0.2110 442908 0.6322
drv_ev_state 16 2311870 14609 0.6319 140019 0.1998
run_queue 16 9167097 55212 0.6023 117993 0.1684
proc_link 154649 2173804 3693 0.1699 111909 0.1597
proc_status 154649 5816882 3037 0.0522 106664 0.1522
proc_msgq 154649 3972636 3305 0.0832 56049 0.0800
process_table 1 444931 4711 1.0588 56042 0.0800
timeofday 1 1323351 6874 0.5194 22078 0.0315
atom_tab 1 1902423 22 0.0012 2187 0.0031
timer_wheel 1 582235 1186 0.2037 1322 0.0019
pollset_rm_list 1 1042209 2035 0.1953 942 0.0013
make_ref 1 373475 592 0.1585 458 0.0007
db_hash_slot 576 646402 203 0.0314 244 0.0003
alcu_allocator 9 18806 16 0.0851 212 0.0003
However, I'm not entirely sure how to interpret it, poll set seems have to do with scheduling, and I'm sure my use of yield is not recommended, but I don't think that would explain the knee I seem to be seeing.
Anyway, I'm looking for any advice from others who might have some experience optimizing who might be able to point out what I'm doing wrong or point to libraries, talks, tools, documentation, or other things I might have missed.
Thanks,
-Anthony
More information about the erlang-questions
mailing list