[erlang-questions] Time to client > time to server?! Is my testing flawed? Please help me explain my results!

Joel Reymont joelr1@REDACTED
Mon Oct 6 09:53:52 CEST 2008


Folks,

I run Mac OSX 10.5.5 on a Mac Pro 2x2.8Ghz Xeon, 14Gb of memory.

Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8] [async- 
threads:2] [kernel-poll:true]

I found that when I run everything on a single VM and launch 500 games  
everything runs through just fine.

I then distributed my OpenPoker test harness over 3 VMs: one hosts  
Mnesia tables and launches games, one runs games and launches bots,  
one runs the bots. The last two VMs are added as extra Mnesia nodes,  
i.e. have RAM copies of tables.

Here's is when I started getting player timeouts. Each player (bot  
actually) is supposed to make a bet within 15 seconds but a lot of  
bots fail to do so in a distributed configuration.

Unlike my previous testing, I now launch all games at the same time.  
There are about 5 players per game so 500 games would make about 2500  
player bots and the same amount of open sockets.

Dumping core upon a player timeout reveals that most of queue lengths  
are 0, a lot of queues have just 1 message in them and two queues have  
2 messages.

I decided to time packets and measure the time it takes for a packet  
to travel from the server to the client and from the client to the  
server. This is where I got results that I can't quite explain...

The time a packet travels from a server to the client is consistently  
greater by a large margin!!!

Would someone kindly offer an explanation?

Rather than adding timestamps to each individual packet, I send a ping  
packet right after any packet the server sends out. I do this from my  
tcpsend macro, i.e.

-define(tcpsend(Socket, Data),
	fun() ->
		XXX = pp:write(Data),
		case catch gen_tcp:send(Socket, XXX) of
		    ok ->
			ok;
                 ...
		end,
		YYY = pp:write(#ping{}),
		case catch gen_tcp:send(Socket, YYY) of
		    ok ->
			ok;
                 ...
		end
	end()).

The ping packet looks like this:

-record(ping, {
           send_time = now()
          }).

Once a bot receives the ping packet, it sends a pong back:

handle_ping(R, Bot) ->
     handle_cast(#pong{ orig_send_time = R#ping.send_time }, Bot),
     {noreply, Bot}.

Here I copy the original send time into the pong packet and stamp it  
with send time on the bot side:

-record(pong, {
           orig_send_time,
           send_time = now(),
           recv_time
          }).

Once my server receives the pong packet, it plugs receive time in:

process_pong(Client, _Socket, R) ->
     R1 = R#pong{ recv_time = now() },
     gen_server:cast(Client#client.server, {'PONG', R1}),
     Client.

I measure the time to client by subtracting original send time from  
the bot send time. The bots are dumb, they reply back as quickly as  
possible and don't contain any logic so there's should be very little  
overhead on the bot side.

The time to server is the difference between time of receipt on the  
server side and the bot send time.

handle_cast({'PONG', R = #pong{}}, Server) ->
     TC = timer:now_diff(R#pong.send_time, R#pong.orig_send_time),
     TS = timer:now_diff(R#pong.recv_time, R#pong.send_time),
     gen_server:cast(?STATS, {'AVG', time_to_client, TC}),
     gen_server:cast(?STATS, {'AVG', time_to_server, TS}),
     {noreply, Server};

I have a separate "statistics" server that keeps track of averages:

handle_cast({'AVG', Id, New}, Data) ->
     Old = case gb_trees:lookup(Id, Data#stats.avg) of
               {value, Val} ->
                   Val;
               none ->
                   0
           end,
     Avg1 = gb_trees:enter(Id, (Old + New) / 2, Data#stats.avg),
     {noreply, Data#stats{ avg = Avg1 }};

When dumping stats, I divide averages by 1000 to get time in  
milliseconds.

So far so good?

Here are the results from testing on a single VM. Note that at some  
point the time to server starts to exceed the time to client.

Timings from testing on a cluster of 3 VMs follow at the bottom.

--- single VM ---

gateway:start('1@REDACTED', 4000, 500000)
Waiting for game servers...
Starting a port server on 4000...
Simulating gameplay with 500 games...
50 games started, 262 players
100 games started, 504 players
150 games started, 743 players
200 games started, 1019 players
250 games started, 1288 players
300 games started, 1529 players
350 games started, 1766 players
400 games started, 2057 players
450 games started, 2304 players
500 games started, 2562 players
dmb: 500 games launching simultaneously
dmb: waiting for games to end...

=INFO REPORT==== 6-Oct-2008::08:14:55 ===
     module: stats
     elapsed: 5.000022
     time_to_client: 0.3116982504004592
     time_to_server: 0.26828120427789537
     bytes: 110919
     packets: 6202
     {bytes,per_sec}: 22183
     {packets,per_sec}: 1240

=INFO REPORT==== 6-Oct-2008::08:15:00 ===
     module: stats
     elapsed: 4.999943
     time_to_client: 0.335493890834141
     time_to_server: 0.2682274480913192
     bytes: 35447
     packets: 1984
     {bytes,per_sec}: 7089
     {packets,per_sec}: 396

=INFO REPORT==== 6-Oct-2008::08:15:10 ===
     module: stats
     elapsed: 4.999295
     time_to_client: 0.8764044453192754
     time_to_server: 0.5871947161189068
     bytes: 600493
     packets: 25147
     {bytes,per_sec}: 120115
     {packets,per_sec}: 5030

=INFO REPORT==== 6-Oct-2008::08:15:15 ===
     module: stats
     elapsed: 5.000729
     time_to_client: 0.416
     time_to_server: 198.9885
     bytes: 25
     packets: 1
     {bytes,per_sec}: 4
     {packets,per_sec}: 0

=INFO REPORT==== 6-Oct-2008::08:15:25 ===
     module: stats
     elapsed: 5.131727
     time_to_client: 4.039440742060922
     time_to_server: 44.38444315504353
     bytes: 49034
     packets: 1979
     {bytes,per_sec}: 9555
     {packets,per_sec}: 385

=INFO REPORT==== 6-Oct-2008::08:15:30 ===
     module: stats
     elapsed: 4.994361
     time_to_client: 5.236234569474154
     time_to_server: 15.505128105674318
     bytes: 8907
     packets: 358
     {bytes,per_sec}: 1783
     {packets,per_sec}: 71
50 games finished

=INFO REPORT==== 6-Oct-2008::08:15:35 ===
     module: stats
     elapsed: 4.923643
     time_to_client: 7.458330178363772
     time_to_server: 49.56836180504743
     bytes: 14380
     packets: 581
     {bytes,per_sec}: 2920
     {packets,per_sec}: 118
100 games finished
150 games finished

=INFO REPORT==== 6-Oct-2008::08:15:40 ===
     module: stats
     elapsed: 5.014354
     time_to_client: 10.303327121777983
     time_to_server: 64.27703119134696
     bytes: 23849
     packets: 960
     {bytes,per_sec}: 4756
     {packets,per_sec}: 191
200 games finished
250 games finished

300 games finished

=INFO REPORT==== 6-Oct-2008::08:15:45 ===
     module: stats
     elapsed: 4.973468
     time_to_client: 13.492860654918857
     time_to_server: 106.97844814674687
     bytes: 28758
     packets: 1162
     {bytes,per_sec}: 5782
     {packets,per_sec}: 233
350 games finished

400 games finished
450 games finished

=INFO REPORT==== 6-Oct-2008::08:15:50 ===
     module: stats
     elapsed: 4.965927
     time_to_client: 20.687073726427755
     time_to_server: 46.96422948544525
     bytes: 76007
     packets: 3071
     {bytes,per_sec}: 15305
     {packets,per_sec}: 618
500 games finished
dmb: exited successfully, 58.529502 seconds elapsed
ok
(1@REDACTED)5>

=INFO REPORT==== 6-Oct-2008::08:15:55 ===
     module: stats
     elapsed: 4.996499
     time_to_client: 1.8395921187621034
     time_to_server: 3.3051747276788346
     bytes: 3845004
     packets: 161772
     {bytes,per_sec}: 769539
     {packets,per_sec}: 32377

Here are the baffling results from testing on a cluster of 3 separate  
VMs. Note that the time to client is consistently higher and reaches  
3.5 seconds at the end, when the game VM is forced to crash upon a  
player timeout.

--- Distributed ---

(1@REDACTED)2> dmb:test(500)

gateway:start('1@REDACTED', 4000, 500000)
Waiting for game servers...
Starting a port server on 4000...
Simulating gameplay with 500 games...
50 games started, 262 players
100 games started, 504 players
150 games started, 743 players
200 games started, 1019 players
250 games started, 1288 players
300 games started, 1529 players
350 games started, 1766 players
400 games started, 2057 players
450 games started, 2304 players
500 games started, 2562 players
dmb: 500 games launching simultaneously
dmb: waiting for games to end...

=INFO REPORT==== 6-Oct-2008::08:17:12 ===
     module: stats
     elapsed: 5.000164
     time_to_client: 0.7472338659989799
     time_to_server: 0.2868026026862225
     bytes: 97276
     packets: 5438
     {bytes,per_sec}: 19454
     {packets,per_sec}: 1087

=INFO REPORT==== 6-Oct-2008::08:17:17 ===
     module: stats
     elapsed: 4.999986
     time_to_client: 0.9655462464556185
     time_to_server: 0.30163815339577854
     bytes: 49090
     packets: 2748
     {bytes,per_sec}: 9818
     {packets,per_sec}: 549

=INFO REPORT==== 6-Oct-2008::08:17:22 ===
     module: stats
     elapsed: 4.999992

=INFO REPORT==== 6-Oct-2008::08:17:27 ===
     module: stats
     elapsed: 5.000022
     time_to_client: 903.3217931168908
     time_to_server: 8.179055363354317
     bytes: 346499
     packets: 14736
     {bytes,per_sec}: 69299
     {packets,per_sec}: 2947

=INFO REPORT==== 6-Oct-2008::08:17:32 ===
     module: stats
     elapsed: 4.999981
     time_to_client: 301.7944063249315
     time_to_server: 0.13660450764194335
     bytes: 254019
     packets: 10412
     {bytes,per_sec}: 50803
     {packets,per_sec}: 2082

=INFO REPORT==== 6-Oct-2008::08:17:37 ===
     module: stats
     elapsed: 5.000355
     time_to_client: 185.66359294523173
     time_to_server: 2.651532941434563
     bytes: 63537
     packets: 2565
     {bytes,per_sec}: 12706
     {packets,per_sec}: 512

=INFO REPORT==== 6-Oct-2008::08:17:42 ===
     module: stats
     elapsed: 4.999885
     time_to_client: 1501.014993336486
     time_to_server: 35.811546393119464
     bytes: 405683
     packets: 16375
     {bytes,per_sec}: 81138
     {packets,per_sec}: 3275

vvvvv the game serving VM crash dumps here, as requested
** exception error: no case clause matching noconnection
      in function  dmb:wait_for_games/1
(1@REDACTED)3>
=INFO REPORT==== 6-Oct-2008::08:17:47 ===
     module: stats
     elapsed: 4.999809
     time_to_client: 3496.1493304273376
     time_to_server: 0.09871293629933679
     bytes: 286105
     packets: 11573
     {bytes,per_sec}: 57223
     {packets,per_sec}: 2314


--
wagerlabs.com








More information about the erlang-questions mailing list