Troubleshooting a high-load scenario

Tue Jan 17 15:34:17 CET 2006

Joel's test case #1
 > I have established that 500 bots from one VM run fine.

Joel's test case #2
 > I have established that 1000 bots do not run fine on one VM. Running  
 > two VMs with 500 bots each fails also.

Joel's test case #3
 > We ran that and it appears that the bottleneck could be on the  
 > server. One machine running 500 bots is fine. Two machines running  
 > 500 bots is not.

Joel's model of the problem:
 > Every bot gets notifications of other bots. So whenever 1 bot acts  
 > everyone else gets notification. 2 bots would generate 2 messages for  
 > every action, 10 bots would generate 10 messages, etc.

That's not linear. Yet you've chosen bot numbers as though the system
was linear, which makes your results mostly useless.

Here's a page which introduces the difference between linear and
quadratic relations:

  http://chatterbeeshomework.homestead.com/chatterbeesmath3.html

In test case #3, your bots have to deal with twice as many incoming
messages per second as in test case #1. So the numbers you have chosen
mean that your results don't allow you to conclude that the server is
the problem.

Or is there some throttling mechanism you're not telling us about?
I.e. is there something which makes the bots' message generation rate
decrease as you increase the number of bots?

Matthias