distributed performance test

Serge Aleynikov serge@REDACTED
Fri Apr 22 21:43:14 CEST 2005


I managed to reduce this problem to a very plain implementation 
illustrated in the attached test_server and test_client modules.

I configured three nodes, and conducted the following tests. Note: 
performance is measured in number of calls per second _of each client_.

CLIENT          SERVER          THREADS PERFORMANCE (Calls/sec)
======          ======          ======= =======================
n2@REDACTED	n2@REDACTED	1	238095
n2@REDACTED	n2@REDACTED	5	43478

n1@REDACTED	n2@REDACTED 1	2364
n1@REDACTED	n2@REDACTED 5	71        !!!  Why?

n2@REDACTED	n1@REDACTED  1	4201
n2@REDACTED	n1@REDACTED  5	55        !!!  Why?

n1@REDACTED	n3@REDACTED	1	164       !!!  Why?
n1@REDACTED	n3@REDACTED	5	3095

n3@REDACTED	n1@REDACTED	1	4629
n3@REDACTED	n1@REDACTED	5	2808

Ping shows that there's no packet loss between nodes, and the roundtrip 
times on all servers are around 0.150 ms.

I am truly stumbled by this measurements.  Can anyone give an advice on 
how to determine the cause of this performance drop?

Thanks.

Serge

Serge Aleynikov wrote:
> Gentlemen,
> 
> I'd like to ask for your advice on how to troubleshoot a problem related 
> to a severe performance drop of a gen_server:call() between a client and 
> a server running on different hosts.
> 
> The server is implementing a gen_server behavior, and a client's code is 
> shown below.
> 
> I've set up two nodes on different servers.   I use a test function
> drp_client:test_async(ConcurrentClients, TimesToRunPerClient) for stress 
> testing.  The function uses spawn/3 to create concurrent processes, 
> where each one calls the drp_client:do_request/2 shown below.
> 
> When I call the server from the same node, the performance is:
> - for a single client: 9434 calls/s
> - for two clients....: 8771 calls/s (~ 4400 call/s per client)
> 
> (drp@REDACTED)6> drp_client:test_async(1, 1000).
> Client  1 done. AvgTime=0.000 (9433.962 c/s)
> (drp@REDACTED)7> drp_client:test_async(2, 1000).
> Client  2 done. AvgTime=0.000 (4405.286 c/s)
> Client  1 done. AvgTime=0.000 (4366.812 c/s)
> 
> ----
> 
> When I call the server from a different host/node, the performance is:
> - for a single client: 2538 calls/s
> - for two clients....: 30   calls/s (~ 15 call/s per client)  <--- !!!
> 
> (n1@REDACTED)9> drp_client:test_async(1, 1000).
> Client  1 done. AvgTime=0.000 (2538.071 c/s)
> (n1@REDACTED)10> drp_client:test_async(2, 100).
> Client  2 done. AvgTime=0.067 (15.020 c/s)
> Client  1 done. AvgTime=0.067 (15.017 c/s)
> 
> For some reason we observe a significant performance drop in case of 
> multiple clients issuing calls from a distributed node.
> 
> I tried to profile clients running on drp@REDACTED (local) and 
> n1@REDACTED (remote) nodes, but results are very similar.
> 
> Now - the exciting part.  When I try to use the coverage tool on the 
> server, and just compile a few modules using cover:compile_module/1, the 
> invocation of clients on the remote node yields expected performance:
> 
> (drp@REDACTED)2> cover:start().
> {ok,<0.115.0>}
> (drp@REDACTED)3> lists:foreach(fun(M) -> cover:compile_module(M) 
> end, [drp_proto, drp_router, drp_server]).
> ok
> 
> (n1@REDACTED)24> drp_client:test_async(2, 500).
> Client  2 done. AvgTime=0.001 (1483.680 c/s)
> Client  1 done. AvgTime=0.001 (1457.726 c/s)
> 
> So, the two questions are:
> 
> 1. What can cause such a significant performance drop in case of 
> multiple concurrent clients accessing the server?
> 
> 2. Why would conver:compile_module/1 eliminate this performance drop.
> 
> Your advice would be highly appreciated.
> 
> Regards,
> 
> Serge
> 
> 
> 
> drp_client:
> ===========
> do_request(Args, Timeout) ->
>     case get_server_pid() of
>     {ok, Pid} ->
>         case catch gen_server:call(Pid, Args, Timeout) of
>         {ok, From, Response} ->
>             {ok, From, Response};
>         {error, Reason} ->
>             {error, Reason};
>         {'EXIT', {noproc, _Reason}} ->
>             {error, server_not_running};
>         {'EXIT', Reason} ->
>             {error, Reason}
>         end;
>     {error, Reason} ->
>         {error, Reason}
>     end.
> 
> get_server_pid() ->
>     case pg2:get_closest_pid(?MODULE) of
>     Pid when is_pid(Pid) ->
>         {ok, Pid};
>     {error, {no_process, _}} ->
>         {error, no_process}
>     end.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> %% Analysis results:
> {  analysis_options,
>  [{callers, false},
>   {sort, acc},
>   {totals, false},
>   {details, true}]}.
> 
> %                                               CNT       ACC       OWN        
> [{ totals,                                       49,    0.403,    0.395}].  %%%
> 
> 
> %                                               CNT       ACC       OWN        
> [{ "<0.52.0>",                                   11,undefined,    0.176}].   %%
> 
> {  {fprof,apply_start_stop,4},                    0,    0.403,    0.035}.      
> {  suspend,                                       1,    0.227,    0.000}.      
> {  {drp_client,test_async,2},                     1,    0.141,    0.021}.      
> {  {drp_client,test_async,3},                     3,    0.071,    0.025}.      
> {  {erlang,spawn_link,3},                         2,    0.046,    0.046}.      
> {  garbage_collect,                               1,    0.025,    0.025}.      
> {  {pg2,start,0},                                 1,    0.024,    0.007}.      
> {  {pg2,ensure_started,0},                        1,    0.017,    0.010}.      
> {  {erlang,whereis,1},                            1,    0.007,    0.007}.      
> {  undefined,                                     0,    0.000,    0.000}.      
> 
> %                                               CNT       ACC       OWN        
> [{ "<0.55.0>",                                   19,undefined,    0.118},    %%
>  { spawned_by, "<0.52.0>"},
>  { spawned_as, {drp_client,do_test,
>                            [2,
>                             500,
>                             {"12345","LNX-US","01170952831122",[],"123"}]}},
>  { initial_calls, [{drp_client,do_test,3},{drp_client,test,2}]}].
> 
> {  {drp_client,do_test,3},                        1,    0.284,    0.010}.      
> {  {drp_client,test,2},                           1,    0.213,    0.014}.      
> {  {drp_client,test2,3},                          1,    0.166,    0.006}.      
> {  suspend,                                       2,    0.166,    0.000}.      
> {  {drp_client,'-test/2-fun-0-',5},               1,    0.160,    0.005}.      
> {  {drp_client,get_route,5},                      1,    0.155,    0.005}.      
> {  {drp_client,measure_time,2},                   1,    0.150,    0.008}.      
> {  {error_handler,undefined_function,3},          1,    0.142,    0.006}.      
> {  {error_handler,ensure_loaded,1},               1,    0.136,    0.010}.      
> {  {code,ensure_loaded,1},                        1,    0.122,    0.005}.      
> {  {code,call,1},                                 1,    0.117,    0.005}.      
> {  {code_server,call,3},                          1,    0.112,    0.007}.      
> {  {pg2,start,0},                                 1,    0.017,    0.004}.      
> {  {dict,new,0},                                  1,    0.016,    0.010}.      
> {  {pg2,ensure_started,0},                        1,    0.013,    0.009}.      
> {  {erlang,whereis,1},                            2,    0.008,    0.008}.      
> {  {dict,mk_seg,1},                               1,    0.006,    0.006}.      
> {  undefined,                                     0,    0.000,    0.000}.      
> 
> %                                               CNT       ACC       OWN        
> [{ "<0.56.0>",                                   19,undefined,    0.101},    %%
>  { spawned_by, "<0.52.0>"},
>  { spawned_as, {drp_client,do_test,
>                            [1,
>                             500,
>                             {"12345","LNX-US","01170952831122",[],"123"}]}},
>  { initial_calls, [{drp_client,do_test,3},{drp_client,test,2}]}].
> 
> {  {drp_client,do_test,3},                        1,    0.251,    0.008}.      
> {  suspend,                                       2,    0.150,    0.000}.      
> {  {drp_client,test,2},                           1,    0.093,    0.013}.      
> {  {drp_client,test2,3},                          1,    0.050,    0.004}.      
> {  {drp_client,'-test/2-fun-0-',5},               1,    0.046,    0.005}.      
> {  {drp_client,get_route,5},                      1,    0.041,    0.005}.      
> {  {drp_client,measure_time,2},                   1,    0.036,    0.004}.      
> {  {error_handler,undefined_function,3},          1,    0.032,    0.005}.      
> {  {error_handler,ensure_loaded,1},               1,    0.027,    0.008}.      
> {  {pg2,start,0},                                 1,    0.017,    0.004}.      
> {  {code,ensure_loaded,1},                        1,    0.014,    0.004}.      
> {  {pg2,ensure_started,0},                        1,    0.013,    0.009}.      
> {  {dict,new,0},                                  1,    0.013,    0.009}.      
> {  {code,call,1},                                 1,    0.010,    0.004}.      
> {  {erlang,whereis,1},                            2,    0.009,    0.009}.      
> {  {code_server,call,3},                          1,    0.006,    0.006}.      
> {  {dict,mk_seg,1},                               1,    0.004,    0.004}.      
> {  undefined,                                     0,    0.000,    0.000}.      
> 
> 
> 
> ------------------------------------------------------------------------
> 
> %% Analysis results:
> {  analysis_options,
>  [{callers, false},
>   {sort, acc},
>   {totals, false},
>   {details, true}]}.
> 
> %                                               CNT       ACC       OWN        
> [{ totals,                                       56,    1.748,    0.417}].  %%%
> 
> 
> %                                               CNT       ACC       OWN        
> [{ "<0.93.0>",                                   18,undefined,    0.214}].   %%
> 
> {  {fprof,apply_start_stop,4},                    0,    1.748,    0.035}.      
> {  suspend,                                       2,    1.534,    0.000}.      
> {  {error_handler,undefined_function,3},          1,    1.503,    0.023}.      
> {  {error_handler,ensure_loaded,1},               1,    1.377,    0.011}.      
> {  {code,ensure_loaded,1},                        1,    1.358,    0.005}.      
> {  {code,call,1},                                 1,    1.353,    0.005}.      
> {  {code_server,call,3},                          1,    1.348,    0.024}.      
> {  {drp_client,test_async,2},                     1,    0.097,    0.012}.      
> {  {drp_client,test_async,3},                     3,    0.065,    0.023}.      
> {  {erlang,spawn_link,3},                         2,    0.042,    0.042}.      
> {  {pg2,start,0},                                 1,    0.020,    0.005}.      
> {  {pg2,ensure_started,0},                        1,    0.015,    0.009}.      
> {  {erlang,whereis,1},                            2,    0.014,    0.014}.      
> {  {erlang,function_exported,3},                  1,    0.006,    0.006}.      
> {  undefined,                                     0,    0.000,    0.000}.      
> 
> %                                               CNT       ACC       OWN        
> [{ "<0.107.0>",                                  19,undefined,    0.107},    %%
>  { spawned_by, "<0.93.0>"},
>  { spawned_as, {drp_client,do_test,
>                            [2,
>                             500,
>                             {"12345","LNX-US","01170952831122",[],"123"}]}},
>  { initial_calls, [{drp_client,do_test,3},{drp_client,test,2}]}].
> 
> {  {drp_client,do_test,3},                        1,    0.262,    0.009}.      
> {  {drp_client,test,2},                           1,    0.197,    0.014}.      
> {  suspend,                                       2,    0.155,    0.000}.      
> {  {drp_client,test2,3},                          1,    0.153,    0.006}.      
> {  {drp_client,'-test/2-fun-0-',5},               1,    0.147,    0.004}.      
> {  {drp_client,get_route,5},                      1,    0.143,    0.005}.      
> {  {drp_client,measure_time,2},                   1,    0.138,    0.005}.      
> {  {error_handler,undefined_function,3},          1,    0.133,    0.005}.      
> {  {error_handler,ensure_loaded,1},               1,    0.128,    0.009}.      
> {  {code,ensure_loaded,1},                        1,    0.114,    0.005}.      
> {  {code,call,1},                                 1,    0.109,    0.004}.      
> {  {code_server,call,3},                          1,    0.105,    0.006}.      
> {  {pg2,start,0},                                 1,    0.016,    0.004}.      
> {  {dict,new,0},                                  1,    0.014,    0.009}.      
> {  {pg2,ensure_started,0},                        1,    0.012,    0.007}.      
> {  {erlang,whereis,1},                            2,    0.010,    0.010}.      
> {  {dict,mk_seg,1},                               1,    0.005,    0.005}.      
> {  undefined,                                     0,    0.000,    0.000}.      
> 
> %                                               CNT       ACC       OWN        
> [{ "<0.108.0>",                                  19,undefined,    0.096},    %%
>  { spawned_by, "<0.93.0>"},
>  { spawned_as, {drp_client,do_test,
>                            [1,
>                             500,
>                             {"12345","LNX-US","01170952831122",[],"123"}]}},
>  { initial_calls, [{drp_client,do_test,3},{drp_client,test,2}]}].
> 
> {  {drp_client,do_test,3},                        1,    0.232,    0.009}.      
> {  suspend,                                       2,    0.136,    0.000}.      
> {  {drp_client,test,2},                           1,    0.087,    0.011}.      
> {  {drp_client,test2,3},                          1,    0.047,    0.005}.      
> {  {drp_client,'-test/2-fun-0-',5},               1,    0.042,    0.004}.      
> {  {drp_client,get_route,5},                      1,    0.038,    0.004}.      
> {  {drp_client,measure_time,2},                   1,    0.034,    0.005}.      
> {  {error_handler,undefined_function,3},          1,    0.029,    0.004}.      
> {  {error_handler,ensure_loaded,1},               1,    0.025,    0.007}.      
> {  {pg2,start,0},                                 1,    0.016,    0.004}.      
> {  {dict,new,0},                                  1,    0.013,    0.009}.      
> {  {code,ensure_loaded,1},                        1,    0.013,    0.005}.      
> {  {pg2,ensure_started,0},                        1,    0.012,    0.008}.      
> {  {erlang,whereis,1},                            2,    0.009,    0.009}.      
> {  {code,call,1},                                 1,    0.008,    0.004}.      
> {  {dict,mk_seg,1},                               1,    0.004,    0.004}.      
> {  {code_server,call,3},                          1,    0.004,    0.004}.      
> {  undefined,                                     0,    0.000,    0.000}.      
> 

-- 
================================================================
| Serge Aleynikov                          Tel: (973) 438-3436
| MIS Telecom                              Fax: (973) 438-1457
| IDT Corp.                                   serge@REDACTED
================================================================
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_server.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20050422/0996998e/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_client.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20050422/0996998e/attachment-0001.ksh>


More information about the erlang-questions mailing list