[erlang-questions] can nodes fail/recover too fast to be seen?

Tim Watson watson.timothy@REDACTED
Fri Jul 5 19:35:43 CEST 2013


On 5 Jul 2013, at 18:32, Mike Oxford wrote:

> net_ticktime defines "how quickly" the nodes ping each other and, thus, notice that a node is down.
> 

Sure, but you've got to be careful with that one. Setting the net_ticktime too short can lead to false positives, i.e., thinking nodes are down when they're simply busy, perhaps responding to some other party or struggling due to a busy_dist_port. Setting the net_ticktime too long might have the opposite effect.

Another one worth a read in that space is "On Failure Detection Algorithms in Overlay Networks - Shelley Q. Zhuang, Dennis Geels, Ion Stoica, Randy H. Katz".

Cheers,
Tim


More information about the erlang-questions mailing list