[erlang-questions] can nodes fail/recover too fast to be seen?
Tim Watson
watson.timothy@REDACTED
Fri Jul 5 19:35:43 CEST 2013
On 5 Jul 2013, at 18:32, Mike Oxford wrote:
> net_ticktime defines "how quickly" the nodes ping each other and, thus, notice that a node is down.
>
Sure, but you've got to be careful with that one. Setting the net_ticktime too short can lead to false positives, i.e., thinking nodes are down when they're simply busy, perhaps responding to some other party or struggling due to a busy_dist_port. Setting the net_ticktime too long might have the opposite effect.
Another one worth a read in that space is "On Failure Detection Algorithms in Overlay Networks - Shelley Q. Zhuang, Dennis Geels, Ion Stoica, Randy H. Katz".
Cheers,
Tim
More information about the erlang-questions
mailing list