large scale deployments and netsplits

Joel Reymont joelr1@REDACTED
Mon Sep 14 15:10:15 CEST 2009

Has anyone tried deployments with 30-50 nodes?

I see large numbers of net splits on EC2,
perhaps because I'm using 3-4 global processes
and that generates large amounts of traffic.

The splits manifest themselves in nodes reporting
lost connections to other nodes.

I'm also starting 10k bots per node and these
establish connections to another cluster of 30+ nodes.
On that cluster I'm occasionally seeing a node or two
peel off but nothing more than that. Certainly nothing
on the scale of splits in the bot cluster where many
nodes loose connections to others.

