Determine when a node goes down - radumarias/rfs GitHub Wiki

CAP theorem

CAP theorem states that you can ofer max 2 from Consistency, Availability Network partition. We target especialy Consistency and Availability as much as we can. Availability is also affected by Network partition so in that case we will choose Consistence.

We need some kind of distributed health check to determine when nodes goes down and execute several actions like copy shards to other nodes to keep the replica count.

https://crates.io/crates/gossip https://highscalability.com/using-gossip-protocols-for-failure-detection-monitoring-mess/ https://www.youtube.com/watch?v=MnYlwjuGcg8&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A&index=7 https://wiki.polkadot.network/docs/maintain-guides-how-to-monitor-your-node

Phi Accrual failure detector https://edward-huang.com/distributed-system/2022/03/17/how-to-detect-a-dead-node-in-a-distributed-system/

Data corruption on disk

https://github.com/radumarias/rfs/wiki/Error%E2%80%90correcting-codes