[erlang-questions] Time to client > time to server?! Is my testing flawed? Please help me explain my results!
Joel Reymont
joelr1@REDACTED
Mon Oct 6 09:53:52 CEST 2008
Folks,
I run Mac OSX 10.5.5 on a Mac Pro 2x2.8Ghz Xeon, 14Gb of memory.
Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8] [async-
threads:2] [kernel-poll:true]
I found that when I run everything on a single VM and launch 500 games
everything runs through just fine.
I then distributed my OpenPoker test harness over 3 VMs: one hosts
Mnesia tables and launches games, one runs games and launches bots,
one runs the bots. The last two VMs are added as extra Mnesia nodes,
i.e. have RAM copies of tables.
Here's is when I started getting player timeouts. Each player (bot
actually) is supposed to make a bet within 15 seconds but a lot of
bots fail to do so in a distributed configuration.
Unlike my previous testing, I now launch all games at the same time.
There are about 5 players per game so 500 games would make about 2500
player bots and the same amount of open sockets.
Dumping core upon a player timeout reveals that most of queue lengths
are 0, a lot of queues have just 1 message in them and two queues have
2 messages.
I decided to time packets and measure the time it takes for a packet
to travel from the server to the client and from the client to the
server. This is where I got results that I can't quite explain...
The time a packet travels from a server to the client is consistently
greater by a large margin!!!
Would someone kindly offer an explanation?
Rather than adding timestamps to each individual packet, I send a ping
packet right after any packet the server sends out. I do this from my
tcpsend macro, i.e.
-define(tcpsend(Socket, Data),
fun() ->
XXX = pp:write(Data),
case catch gen_tcp:send(Socket, XXX) of
ok ->
ok;
...
end,
YYY = pp:write(#ping{}),
case catch gen_tcp:send(Socket, YYY) of
ok ->
ok;
...
end
end()).
The ping packet looks like this:
-record(ping, {
send_time = now()
}).
Once a bot receives the ping packet, it sends a pong back:
handle_ping(R, Bot) ->
handle_cast(#pong{ orig_send_time = R#ping.send_time }, Bot),
{noreply, Bot}.
Here I copy the original send time into the pong packet and stamp it
with send time on the bot side:
-record(pong, {
orig_send_time,
send_time = now(),
recv_time
}).
Once my server receives the pong packet, it plugs receive time in:
process_pong(Client, _Socket, R) ->
R1 = R#pong{ recv_time = now() },
gen_server:cast(Client#client.server, {'PONG', R1}),
Client.
I measure the time to client by subtracting original send time from
the bot send time. The bots are dumb, they reply back as quickly as
possible and don't contain any logic so there's should be very little
overhead on the bot side.
The time to server is the difference between time of receipt on the
server side and the bot send time.
handle_cast({'PONG', R = #pong{}}, Server) ->
TC = timer:now_diff(R#pong.send_time, R#pong.orig_send_time),
TS = timer:now_diff(R#pong.recv_time, R#pong.send_time),
gen_server:cast(?STATS, {'AVG', time_to_client, TC}),
gen_server:cast(?STATS, {'AVG', time_to_server, TS}),
{noreply, Server};
I have a separate "statistics" server that keeps track of averages:
handle_cast({'AVG', Id, New}, Data) ->
Old = case gb_trees:lookup(Id, Data#stats.avg) of
{value, Val} ->
Val;
none ->
0
end,
Avg1 = gb_trees:enter(Id, (Old + New) / 2, Data#stats.avg),
{noreply, Data#stats{ avg = Avg1 }};
When dumping stats, I divide averages by 1000 to get time in
milliseconds.
So far so good?
Here are the results from testing on a single VM. Note that at some
point the time to server starts to exceed the time to client.
Timings from testing on a cluster of 3 VMs follow at the bottom.
--- single VM ---
gateway:start('1@REDACTED', 4000, 500000)
Waiting for game servers...
Starting a port server on 4000...
Simulating gameplay with 500 games...
50 games started, 262 players
100 games started, 504 players
150 games started, 743 players
200 games started, 1019 players
250 games started, 1288 players
300 games started, 1529 players
350 games started, 1766 players
400 games started, 2057 players
450 games started, 2304 players
500 games started, 2562 players
dmb: 500 games launching simultaneously
dmb: waiting for games to end...
=INFO REPORT==== 6-Oct-2008::08:14:55 ===
module: stats
elapsed: 5.000022
time_to_client: 0.3116982504004592
time_to_server: 0.26828120427789537
bytes: 110919
packets: 6202
{bytes,per_sec}: 22183
{packets,per_sec}: 1240
=INFO REPORT==== 6-Oct-2008::08:15:00 ===
module: stats
elapsed: 4.999943
time_to_client: 0.335493890834141
time_to_server: 0.2682274480913192
bytes: 35447
packets: 1984
{bytes,per_sec}: 7089
{packets,per_sec}: 396
=INFO REPORT==== 6-Oct-2008::08:15:10 ===
module: stats
elapsed: 4.999295
time_to_client: 0.8764044453192754
time_to_server: 0.5871947161189068
bytes: 600493
packets: 25147
{bytes,per_sec}: 120115
{packets,per_sec}: 5030
=INFO REPORT==== 6-Oct-2008::08:15:15 ===
module: stats
elapsed: 5.000729
time_to_client: 0.416
time_to_server: 198.9885
bytes: 25
packets: 1
{bytes,per_sec}: 4
{packets,per_sec}: 0
=INFO REPORT==== 6-Oct-2008::08:15:25 ===
module: stats
elapsed: 5.131727
time_to_client: 4.039440742060922
time_to_server: 44.38444315504353
bytes: 49034
packets: 1979
{bytes,per_sec}: 9555
{packets,per_sec}: 385
=INFO REPORT==== 6-Oct-2008::08:15:30 ===
module: stats
elapsed: 4.994361
time_to_client: 5.236234569474154
time_to_server: 15.505128105674318
bytes: 8907
packets: 358
{bytes,per_sec}: 1783
{packets,per_sec}: 71
50 games finished
=INFO REPORT==== 6-Oct-2008::08:15:35 ===
module: stats
elapsed: 4.923643
time_to_client: 7.458330178363772
time_to_server: 49.56836180504743
bytes: 14380
packets: 581
{bytes,per_sec}: 2920
{packets,per_sec}: 118
100 games finished
150 games finished
=INFO REPORT==== 6-Oct-2008::08:15:40 ===
module: stats
elapsed: 5.014354
time_to_client: 10.303327121777983
time_to_server: 64.27703119134696
bytes: 23849
packets: 960
{bytes,per_sec}: 4756
{packets,per_sec}: 191
200 games finished
250 games finished
300 games finished
=INFO REPORT==== 6-Oct-2008::08:15:45 ===
module: stats
elapsed: 4.973468
time_to_client: 13.492860654918857
time_to_server: 106.97844814674687
bytes: 28758
packets: 1162
{bytes,per_sec}: 5782
{packets,per_sec}: 233
350 games finished
400 games finished
450 games finished
=INFO REPORT==== 6-Oct-2008::08:15:50 ===
module: stats
elapsed: 4.965927
time_to_client: 20.687073726427755
time_to_server: 46.96422948544525
bytes: 76007
packets: 3071
{bytes,per_sec}: 15305
{packets,per_sec}: 618
500 games finished
dmb: exited successfully, 58.529502 seconds elapsed
ok
(1@REDACTED)5>
=INFO REPORT==== 6-Oct-2008::08:15:55 ===
module: stats
elapsed: 4.996499
time_to_client: 1.8395921187621034
time_to_server: 3.3051747276788346
bytes: 3845004
packets: 161772
{bytes,per_sec}: 769539
{packets,per_sec}: 32377
Here are the baffling results from testing on a cluster of 3 separate
VMs. Note that the time to client is consistently higher and reaches
3.5 seconds at the end, when the game VM is forced to crash upon a
player timeout.
--- Distributed ---
(1@REDACTED)2> dmb:test(500)
gateway:start('1@REDACTED', 4000, 500000)
Waiting for game servers...
Starting a port server on 4000...
Simulating gameplay with 500 games...
50 games started, 262 players
100 games started, 504 players
150 games started, 743 players
200 games started, 1019 players
250 games started, 1288 players
300 games started, 1529 players
350 games started, 1766 players
400 games started, 2057 players
450 games started, 2304 players
500 games started, 2562 players
dmb: 500 games launching simultaneously
dmb: waiting for games to end...
=INFO REPORT==== 6-Oct-2008::08:17:12 ===
module: stats
elapsed: 5.000164
time_to_client: 0.7472338659989799
time_to_server: 0.2868026026862225
bytes: 97276
packets: 5438
{bytes,per_sec}: 19454
{packets,per_sec}: 1087
=INFO REPORT==== 6-Oct-2008::08:17:17 ===
module: stats
elapsed: 4.999986
time_to_client: 0.9655462464556185
time_to_server: 0.30163815339577854
bytes: 49090
packets: 2748
{bytes,per_sec}: 9818
{packets,per_sec}: 549
=INFO REPORT==== 6-Oct-2008::08:17:22 ===
module: stats
elapsed: 4.999992
=INFO REPORT==== 6-Oct-2008::08:17:27 ===
module: stats
elapsed: 5.000022
time_to_client: 903.3217931168908
time_to_server: 8.179055363354317
bytes: 346499
packets: 14736
{bytes,per_sec}: 69299
{packets,per_sec}: 2947
=INFO REPORT==== 6-Oct-2008::08:17:32 ===
module: stats
elapsed: 4.999981
time_to_client: 301.7944063249315
time_to_server: 0.13660450764194335
bytes: 254019
packets: 10412
{bytes,per_sec}: 50803
{packets,per_sec}: 2082
=INFO REPORT==== 6-Oct-2008::08:17:37 ===
module: stats
elapsed: 5.000355
time_to_client: 185.66359294523173
time_to_server: 2.651532941434563
bytes: 63537
packets: 2565
{bytes,per_sec}: 12706
{packets,per_sec}: 512
=INFO REPORT==== 6-Oct-2008::08:17:42 ===
module: stats
elapsed: 4.999885
time_to_client: 1501.014993336486
time_to_server: 35.811546393119464
bytes: 405683
packets: 16375
{bytes,per_sec}: 81138
{packets,per_sec}: 3275
vvvvv the game serving VM crash dumps here, as requested
** exception error: no case clause matching noconnection
in function dmb:wait_for_games/1
(1@REDACTED)3>
=INFO REPORT==== 6-Oct-2008::08:17:47 ===
module: stats
elapsed: 4.999809
time_to_client: 3496.1493304273376
time_to_server: 0.09871293629933679
bytes: 286105
packets: 11573
{bytes,per_sec}: 57223
{packets,per_sec}: 2314
--
wagerlabs.com
More information about the erlang-questions
mailing list