distributed performance test
Serge Aleynikov
serge@REDACTED
Fri Apr 22 21:43:14 CEST 2005
I managed to reduce this problem to a very plain implementation
illustrated in the attached test_server and test_client modules.
I configured three nodes, and conducted the following tests. Note:
performance is measured in number of calls per second _of each client_.
CLIENT SERVER THREADS PERFORMANCE (Calls/sec)
====== ====== ======= =======================
n2@REDACTED n2@REDACTED 1 238095
n2@REDACTED n2@REDACTED 5 43478
n1@REDACTED n2@REDACTED 1 2364
n1@REDACTED n2@REDACTED 5 71 !!! Why?
n2@REDACTED n1@REDACTED 1 4201
n2@REDACTED n1@REDACTED 5 55 !!! Why?
n1@REDACTED n3@REDACTED 1 164 !!! Why?
n1@REDACTED n3@REDACTED 5 3095
n3@REDACTED n1@REDACTED 1 4629
n3@REDACTED n1@REDACTED 5 2808
Ping shows that there's no packet loss between nodes, and the roundtrip
times on all servers are around 0.150 ms.
I am truly stumbled by this measurements. Can anyone give an advice on
how to determine the cause of this performance drop?
Thanks.
Serge
Serge Aleynikov wrote:
> Gentlemen,
>
> I'd like to ask for your advice on how to troubleshoot a problem related
> to a severe performance drop of a gen_server:call() between a client and
> a server running on different hosts.
>
> The server is implementing a gen_server behavior, and a client's code is
> shown below.
>
> I've set up two nodes on different servers. I use a test function
> drp_client:test_async(ConcurrentClients, TimesToRunPerClient) for stress
> testing. The function uses spawn/3 to create concurrent processes,
> where each one calls the drp_client:do_request/2 shown below.
>
> When I call the server from the same node, the performance is:
> - for a single client: 9434 calls/s
> - for two clients....: 8771 calls/s (~ 4400 call/s per client)
>
> (drp@REDACTED)6> drp_client:test_async(1, 1000).
> Client 1 done. AvgTime=0.000 (9433.962 c/s)
> (drp@REDACTED)7> drp_client:test_async(2, 1000).
> Client 2 done. AvgTime=0.000 (4405.286 c/s)
> Client 1 done. AvgTime=0.000 (4366.812 c/s)
>
> ----
>
> When I call the server from a different host/node, the performance is:
> - for a single client: 2538 calls/s
> - for two clients....: 30 calls/s (~ 15 call/s per client) <--- !!!
>
> (n1@REDACTED)9> drp_client:test_async(1, 1000).
> Client 1 done. AvgTime=0.000 (2538.071 c/s)
> (n1@REDACTED)10> drp_client:test_async(2, 100).
> Client 2 done. AvgTime=0.067 (15.020 c/s)
> Client 1 done. AvgTime=0.067 (15.017 c/s)
>
> For some reason we observe a significant performance drop in case of
> multiple clients issuing calls from a distributed node.
>
> I tried to profile clients running on drp@REDACTED (local) and
> n1@REDACTED (remote) nodes, but results are very similar.
>
> Now - the exciting part. When I try to use the coverage tool on the
> server, and just compile a few modules using cover:compile_module/1, the
> invocation of clients on the remote node yields expected performance:
>
> (drp@REDACTED)2> cover:start().
> {ok,<0.115.0>}
> (drp@REDACTED)3> lists:foreach(fun(M) -> cover:compile_module(M)
> end, [drp_proto, drp_router, drp_server]).
> ok
>
> (n1@REDACTED)24> drp_client:test_async(2, 500).
> Client 2 done. AvgTime=0.001 (1483.680 c/s)
> Client 1 done. AvgTime=0.001 (1457.726 c/s)
>
> So, the two questions are:
>
> 1. What can cause such a significant performance drop in case of
> multiple concurrent clients accessing the server?
>
> 2. Why would conver:compile_module/1 eliminate this performance drop.
>
> Your advice would be highly appreciated.
>
> Regards,
>
> Serge
>
>
>
> drp_client:
> ===========
> do_request(Args, Timeout) ->
> case get_server_pid() of
> {ok, Pid} ->
> case catch gen_server:call(Pid, Args, Timeout) of
> {ok, From, Response} ->
> {ok, From, Response};
> {error, Reason} ->
> {error, Reason};
> {'EXIT', {noproc, _Reason}} ->
> {error, server_not_running};
> {'EXIT', Reason} ->
> {error, Reason}
> end;
> {error, Reason} ->
> {error, Reason}
> end.
>
> get_server_pid() ->
> case pg2:get_closest_pid(?MODULE) of
> Pid when is_pid(Pid) ->
> {ok, Pid};
> {error, {no_process, _}} ->
> {error, no_process}
> end.
>
>
>
> ------------------------------------------------------------------------
>
> %% Analysis results:
> { analysis_options,
> [{callers, false},
> {sort, acc},
> {totals, false},
> {details, true}]}.
>
> % CNT ACC OWN
> [{ totals, 49, 0.403, 0.395}]. %%%
>
>
> % CNT ACC OWN
> [{ "<0.52.0>", 11,undefined, 0.176}]. %%
>
> { {fprof,apply_start_stop,4}, 0, 0.403, 0.035}.
> { suspend, 1, 0.227, 0.000}.
> { {drp_client,test_async,2}, 1, 0.141, 0.021}.
> { {drp_client,test_async,3}, 3, 0.071, 0.025}.
> { {erlang,spawn_link,3}, 2, 0.046, 0.046}.
> { garbage_collect, 1, 0.025, 0.025}.
> { {pg2,start,0}, 1, 0.024, 0.007}.
> { {pg2,ensure_started,0}, 1, 0.017, 0.010}.
> { {erlang,whereis,1}, 1, 0.007, 0.007}.
> { undefined, 0, 0.000, 0.000}.
>
> % CNT ACC OWN
> [{ "<0.55.0>", 19,undefined, 0.118}, %%
> { spawned_by, "<0.52.0>"},
> { spawned_as, {drp_client,do_test,
> [2,
> 500,
> {"12345","LNX-US","01170952831122",[],"123"}]}},
> { initial_calls, [{drp_client,do_test,3},{drp_client,test,2}]}].
>
> { {drp_client,do_test,3}, 1, 0.284, 0.010}.
> { {drp_client,test,2}, 1, 0.213, 0.014}.
> { {drp_client,test2,3}, 1, 0.166, 0.006}.
> { suspend, 2, 0.166, 0.000}.
> { {drp_client,'-test/2-fun-0-',5}, 1, 0.160, 0.005}.
> { {drp_client,get_route,5}, 1, 0.155, 0.005}.
> { {drp_client,measure_time,2}, 1, 0.150, 0.008}.
> { {error_handler,undefined_function,3}, 1, 0.142, 0.006}.
> { {error_handler,ensure_loaded,1}, 1, 0.136, 0.010}.
> { {code,ensure_loaded,1}, 1, 0.122, 0.005}.
> { {code,call,1}, 1, 0.117, 0.005}.
> { {code_server,call,3}, 1, 0.112, 0.007}.
> { {pg2,start,0}, 1, 0.017, 0.004}.
> { {dict,new,0}, 1, 0.016, 0.010}.
> { {pg2,ensure_started,0}, 1, 0.013, 0.009}.
> { {erlang,whereis,1}, 2, 0.008, 0.008}.
> { {dict,mk_seg,1}, 1, 0.006, 0.006}.
> { undefined, 0, 0.000, 0.000}.
>
> % CNT ACC OWN
> [{ "<0.56.0>", 19,undefined, 0.101}, %%
> { spawned_by, "<0.52.0>"},
> { spawned_as, {drp_client,do_test,
> [1,
> 500,
> {"12345","LNX-US","01170952831122",[],"123"}]}},
> { initial_calls, [{drp_client,do_test,3},{drp_client,test,2}]}].
>
> { {drp_client,do_test,3}, 1, 0.251, 0.008}.
> { suspend, 2, 0.150, 0.000}.
> { {drp_client,test,2}, 1, 0.093, 0.013}.
> { {drp_client,test2,3}, 1, 0.050, 0.004}.
> { {drp_client,'-test/2-fun-0-',5}, 1, 0.046, 0.005}.
> { {drp_client,get_route,5}, 1, 0.041, 0.005}.
> { {drp_client,measure_time,2}, 1, 0.036, 0.004}.
> { {error_handler,undefined_function,3}, 1, 0.032, 0.005}.
> { {error_handler,ensure_loaded,1}, 1, 0.027, 0.008}.
> { {pg2,start,0}, 1, 0.017, 0.004}.
> { {code,ensure_loaded,1}, 1, 0.014, 0.004}.
> { {pg2,ensure_started,0}, 1, 0.013, 0.009}.
> { {dict,new,0}, 1, 0.013, 0.009}.
> { {code,call,1}, 1, 0.010, 0.004}.
> { {erlang,whereis,1}, 2, 0.009, 0.009}.
> { {code_server,call,3}, 1, 0.006, 0.006}.
> { {dict,mk_seg,1}, 1, 0.004, 0.004}.
> { undefined, 0, 0.000, 0.000}.
>
>
>
> ------------------------------------------------------------------------
>
> %% Analysis results:
> { analysis_options,
> [{callers, false},
> {sort, acc},
> {totals, false},
> {details, true}]}.
>
> % CNT ACC OWN
> [{ totals, 56, 1.748, 0.417}]. %%%
>
>
> % CNT ACC OWN
> [{ "<0.93.0>", 18,undefined, 0.214}]. %%
>
> { {fprof,apply_start_stop,4}, 0, 1.748, 0.035}.
> { suspend, 2, 1.534, 0.000}.
> { {error_handler,undefined_function,3}, 1, 1.503, 0.023}.
> { {error_handler,ensure_loaded,1}, 1, 1.377, 0.011}.
> { {code,ensure_loaded,1}, 1, 1.358, 0.005}.
> { {code,call,1}, 1, 1.353, 0.005}.
> { {code_server,call,3}, 1, 1.348, 0.024}.
> { {drp_client,test_async,2}, 1, 0.097, 0.012}.
> { {drp_client,test_async,3}, 3, 0.065, 0.023}.
> { {erlang,spawn_link,3}, 2, 0.042, 0.042}.
> { {pg2,start,0}, 1, 0.020, 0.005}.
> { {pg2,ensure_started,0}, 1, 0.015, 0.009}.
> { {erlang,whereis,1}, 2, 0.014, 0.014}.
> { {erlang,function_exported,3}, 1, 0.006, 0.006}.
> { undefined, 0, 0.000, 0.000}.
>
> % CNT ACC OWN
> [{ "<0.107.0>", 19,undefined, 0.107}, %%
> { spawned_by, "<0.93.0>"},
> { spawned_as, {drp_client,do_test,
> [2,
> 500,
> {"12345","LNX-US","01170952831122",[],"123"}]}},
> { initial_calls, [{drp_client,do_test,3},{drp_client,test,2}]}].
>
> { {drp_client,do_test,3}, 1, 0.262, 0.009}.
> { {drp_client,test,2}, 1, 0.197, 0.014}.
> { suspend, 2, 0.155, 0.000}.
> { {drp_client,test2,3}, 1, 0.153, 0.006}.
> { {drp_client,'-test/2-fun-0-',5}, 1, 0.147, 0.004}.
> { {drp_client,get_route,5}, 1, 0.143, 0.005}.
> { {drp_client,measure_time,2}, 1, 0.138, 0.005}.
> { {error_handler,undefined_function,3}, 1, 0.133, 0.005}.
> { {error_handler,ensure_loaded,1}, 1, 0.128, 0.009}.
> { {code,ensure_loaded,1}, 1, 0.114, 0.005}.
> { {code,call,1}, 1, 0.109, 0.004}.
> { {code_server,call,3}, 1, 0.105, 0.006}.
> { {pg2,start,0}, 1, 0.016, 0.004}.
> { {dict,new,0}, 1, 0.014, 0.009}.
> { {pg2,ensure_started,0}, 1, 0.012, 0.007}.
> { {erlang,whereis,1}, 2, 0.010, 0.010}.
> { {dict,mk_seg,1}, 1, 0.005, 0.005}.
> { undefined, 0, 0.000, 0.000}.
>
> % CNT ACC OWN
> [{ "<0.108.0>", 19,undefined, 0.096}, %%
> { spawned_by, "<0.93.0>"},
> { spawned_as, {drp_client,do_test,
> [1,
> 500,
> {"12345","LNX-US","01170952831122",[],"123"}]}},
> { initial_calls, [{drp_client,do_test,3},{drp_client,test,2}]}].
>
> { {drp_client,do_test,3}, 1, 0.232, 0.009}.
> { suspend, 2, 0.136, 0.000}.
> { {drp_client,test,2}, 1, 0.087, 0.011}.
> { {drp_client,test2,3}, 1, 0.047, 0.005}.
> { {drp_client,'-test/2-fun-0-',5}, 1, 0.042, 0.004}.
> { {drp_client,get_route,5}, 1, 0.038, 0.004}.
> { {drp_client,measure_time,2}, 1, 0.034, 0.005}.
> { {error_handler,undefined_function,3}, 1, 0.029, 0.004}.
> { {error_handler,ensure_loaded,1}, 1, 0.025, 0.007}.
> { {pg2,start,0}, 1, 0.016, 0.004}.
> { {dict,new,0}, 1, 0.013, 0.009}.
> { {code,ensure_loaded,1}, 1, 0.013, 0.005}.
> { {pg2,ensure_started,0}, 1, 0.012, 0.008}.
> { {erlang,whereis,1}, 2, 0.009, 0.009}.
> { {code,call,1}, 1, 0.008, 0.004}.
> { {dict,mk_seg,1}, 1, 0.004, 0.004}.
> { {code_server,call,3}, 1, 0.004, 0.004}.
> { undefined, 0, 0.000, 0.000}.
>
--
================================================================
| Serge Aleynikov Tel: (973) 438-3436
| MIS Telecom Fax: (973) 438-1457
| IDT Corp. serge@REDACTED
================================================================
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_server.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20050422/0996998e/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_client.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20050422/0996998e/attachment-0001.ksh>
More information about the erlang-questions
mailing list