[erlang-questions] large scale deployments and netsplits
Joel Reymont
joelr1@REDACTED
Wed Sep 16 20:02:14 CEST 2009
This is with a default net tick time.
=INFO REPORT==== 16-Sep-2009::17:44:55 ===
module: stats
elapsed: 20.014404
{flash_errors,1}: 24350
{flash_total_connected,1}: 479678
{flash_total_fails,1}: 493159
{flash_total_started,1}: 468819
{flash_total_sub_ack,1}: 444984
{flash_total_sub_req,1}: 448042
{flash_total_tcp_errors,1}: 802
{flash_connected,1}: 290320
{flash_started,1}: 253302
{flash_sub_ack,1}: 270553
{flash_sub_req,1}: 270999
{flash_tcp_errors,1}: 280
{"flash_connected/sec",1}: 14505
{"flash_started/sec",1}: 12655
{"flash_sub_ack/sec",1}: 13517
{"flash_sub_req/sec",1}: 13540
{"flash_tcp_errors/sec",1}: 13
I have 468,819 bots started on 100 small EC2 instances,
1 VM per instance. 479,678 bots connected. The number
is higher than started because bots can connect multiple
times, e.g. when there's an error. 290,320 bots connected
in just the last 20s.
=ERROR REPORT==== 16-Sep-2009::17:45:15 ===
** Node 'janus@REDACTED' not
responding **
** Removing (timedout) connection **
** at node janus@REDACTED **
=INFO REPORT==== 16-Sep-2009::17:45:15 ===
netsplit: down: 'janus@REDACTED',
latency: 3.0330ms
** at node janus@REDACTED **
I put in a piece of code that saves nodes that are up
and pings them every 15s, watching out for pongs and saving
the latency. I print the latency once the node is down.
What I see is a latency of just 3ms (within EC2 of course)
when the node splits. What could be causing this?
Thanks, Joel
---
fastest mac firefox!
http://wagerlabs.com
More information about the erlang-questions
mailing list