[erlang-questions] Performance analysis advice

ANTHONY MOLINARO anthonym@REDACTED
Tue Feb 11 02:34:09 CET 2014


Hi Fred,

I'd actually been meaning to try dispcount out for some time, but only got to it recently when I was asked to look into optimizing this sub-system, since all our backends are thrift I ended up generalizing the cassanderl thrift layer and added my retry logic.

Currently the pool sizes are 500, 500 and 200 (the one which is 200 is only hit on a small amount of requests, so even 200 is probably overkill).  I tried upping the first 2 to 800 to see if I could get more throughput, but it did not seem to have an effect.

I noticed the use of now(), and just replaced it with os:timestamp/0.  I also did the same thing in a handful of places in webmachine.  When I first did the lock counting the timeofday lock was 3-4x what it is now.

I'll try out your function tomorrow and compare to the os:timestamp() version, I have a feeling it won't make too much difference as os:timestamp() is pretty fast (and as far as I know doesn't use the timeofday lock), but we'll see.

The pollset lock really seems to be the one contending, but I'd have to dig into the emulator a bit to really understand if there's anything I can do there.

I also feel I might be able to use some of the scheduler flags to good effect, but haven't quite figured out how yet :)

I'm

On Feb 10, 2014, at 5:04 PM, Fred Hebert <mononcqc@REDACTED> wrote:

> Hi Anthony,
> 
> Glad to see someone using dispcount!
> 
> Can I ask what are your pool sizes? Can the time increase correlate with
> the time you start getting busy or retrying a lot?
> 
> I know that what you could be seeing is locking in dispcount itself. To
> allow fair redistribution of retries with dispcount, it makes use of
> erlang:now/0 which could easily be the problem you have there.
> 
> See https://github.com/ferd/dispcount/blob/master/src/dispcount_watcher.erl#L148-149
> 
> I had never seen it be a problem before so I never optimized it away,
> but you could try subbing it with:
> 
>    dispatch_id(Num) ->
>        erlang:phash2(local_unique(), Num) + 1.
> 
>    local_unique() ->
>        Ct = case get('$dispcount_ct') of
>            undefined -> put('$dispcount_ct', 1), 1;
>            N -> put('$dispcount_ct', N+1), N+1
>        end,
>        {Ct,self()}.
> 
> That might be enough to keep giving even distribution, without needing
> external data (now()) -- the pid and counter could lead to enough
> uniqueness overall to work fine.
> 
> If it goes well, let me know and I'll patch dispcount for this, and ask
> people I know who use it if they feel like giving it a try.
> 
> Regards,
> Fred.
> 
> On 02/10, ANTHONY MOLINARO wrote:
>> Hi,
>> 
>> I've been recently trying to find and eliminate bottlenecks in a web service.  I had a few cases I knew were bottlenecks which I eliminated, and I've reached the point where I'm not sure what to look into next.
>> 
>> The server is basically
>> 
>> - webmachine
>> - 3 backend services fronted by dispcount
>> - each request is an erlang process which calls either 2 or 3 of the backend services in sequence, when a resource is unavailable the process will erlang:yield/0 and try again next time it is scheduled.
>> - very little is done per request (or at least fprof doesn't show any hotspots).
>> 
>> The current issue seems to a performance knee of some sort.  As I add traffic to my test node I see CPU utilization spike at around 200 concurrent requests.  It's a pretty dramatic knee, here's a sample of the data points (columns are timestamp, number of established connections 'netstat -n' and %cpu 'top -b').
>> 
>> 2014/02/11-00:34:02 0 8.0
>> 2014/02/11-00:34:04 0 12.0
>> 2014/02/11-00:34:06 14 95.8
>> 2014/02/11-00:34:08 30 153.8
>> 2014/02/11-00:34:10 34 169.7
>> 2014/02/11-00:34:12 64 269.6
>> 2014/02/11-00:34:14 58 311.5
>> 2014/02/11-00:34:16 81 385.4
>> 2014/02/11-00:34:18 94 463.3
>> 2014/02/11-00:34:21 114 523.4
>> 2014/02/11-00:34:23 114 595.0
>> 2014/02/11-00:34:25 153 670.9
>> 2014/02/11-00:34:27 149 777.2
>> 2014/02/11-00:34:29 165 1359.5
>> 2014/02/11-00:34:31 198 1529.3
>> 2014/02/11-00:34:33 350 1517.4
>> 2014/02/11-00:34:36 369 1523.2
>> 2014/02/11-00:34:38 355 1525.3
>> 
>> It's a 16 core, so 1500% is pretty much maxed out.  But notice the jump between 149 connections and 198 connections, it's almost double the CPU.
>> 
>> I tried running the lock counting profiler during running and it looks like this
>> 
>>            lock     id  #tries  #collisions  collisions [%]  time [us]  duration [%]
>>           -----    --- ------- ------------ --------------- ---------- -------------
>>         pollset      1 2369355        71578          3.0210     553249        0.7896
>>       proc_main 154649 6315238        13325          0.2110     442908        0.6322
>>    drv_ev_state     16 2311870        14609          0.6319     140019        0.1998
>>       run_queue     16 9167097        55212          0.6023     117993        0.1684
>>       proc_link 154649 2173804         3693          0.1699     111909        0.1597
>>     proc_status 154649 5816882         3037          0.0522     106664        0.1522
>>       proc_msgq 154649 3972636         3305          0.0832      56049        0.0800
>>   process_table      1  444931         4711          1.0588      56042        0.0800
>>       timeofday      1 1323351         6874          0.5194      22078        0.0315
>>        atom_tab      1 1902423           22          0.0012       2187        0.0031
>>     timer_wheel      1  582235         1186          0.2037       1322        0.0019
>> pollset_rm_list      1 1042209         2035          0.1953        942        0.0013
>>        make_ref      1  373475          592          0.1585        458        0.0007
>>    db_hash_slot    576  646402          203          0.0314        244        0.0003
>>  alcu_allocator      9   18806           16          0.0851        212        0.0003
>> 
>> However, I'm not entirely sure how to interpret it, poll set seems have to do with scheduling, and I'm sure my use of yield is not recommended, but I don't think that would explain the knee I seem to be seeing.
>> 
>> Anyway, I'm looking for any advice from others who might have some experience optimizing who might be able to point out what I'm doing wrong or point to libraries, talks, tools, documentation, or other things I might have missed.
>> 
>> Thanks,
>> 
>> -Anthony
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list