Troubleshooting a high-load scenario

Tue Jan 17 12:33:32 CET 2006

Folks,

I have a test harness that launches poker bots against a poker  
server. The harness is written in Erlang but the poker server is C++  
on Windows. The poker server uses completion ports and async IO.

I'm running into trouble with just 500 bots playing on the server,  
launched from the same VM. It appears that the bots get their  
commands up to 1 minute late. I'm trying to troubleshoot this and I'm  
looking for ideas. I would like to believe that it's not Erlang  
running out of steam but the C++ server :-).

I read the packets manually since their length is little-endian and  
includes the whole packet (with the 4-byte length). I enabled  
{nodelay, true} on the socket since I always write complete packets  
to the socket.

I use selective receive and used to have no flow control in my socket  
reader. It would just read packet length, read packet and send the  
whole packet to its parent. Message queues were filling up when I was  
doing that so I only read the next network message once the current  
one has been processed.

I'm using the default socket buffer size for sending and receiving.  
I'm not sure what the default buffer size as it's not stated in 'man  
inet'. I do not have the source code for the poker server and I'm not  
sure what IOCP does in the scenario when I delay reading from the  
socket on my end. I'm being told by the client's techs that I could  
be getting the command 1 minute late because I'm reading it from the  
socket 1 minute late and the command sits in the network buffers all  
the while.

How do I troubleshoot this scenario? The bots don't do much  
processing themselves, basically make a decision and shoot a command  
back. They don't even react to all commands. The server spits out  
packets all the time, though, since all bots in the game get game  
notifications and table state updates from the lobby.

	Thanks, Joel

--
http://wagerlabs.com/