Data Nodes cluster - radumarias/rfs GitHub Wiki
Master cluster
We will have a cluster of coordinator nodes in Raft
.
Main functionality is to know which nodes are up so we can send those to clients so they will do load balancing or if we need to do it. Also we need that set to know to what nodes to distribute shards and from which we can sync, read and write data.
We can keep a Set of up nodes, health check between all and modify that Set.
- https://crates.io/crates/ng-storage-rocksdb
- https://docs.rs/datacake/latest/datacake/
- https://en.m.wikipedia.org/wiki/Riak https://inria.hal.science/hal-03278658/document
- https://www.youtube.com/watch?v=x7drE24geUw
- https://www.youtube.com/watch?v=vBU70EjwGfw
- https://www.bartoszsypytkowski.com/the-state-of-a-state-based-crdts/
- https://jakelazaroff.com/words/an-interactive-intro-to-crdts/
- https://crdt.tech/
- https://github.com/znx3p0/canary
- https://github.com/juicedata/juicefs
- https://wiki.iota.org/
Masterless cluster
If we don't want the penalty of a single master at a time we might explore CRDTs
with Redis Sets
to enforce constraints like uniqueness of filenames in same folder.
We could keep a modifiable (add, remove) Set or Map of nodes, health check between all and modify that structure and converge that.
For example we can have a Map
like
key
:node_key: String
, can be the nodeIP
value
:state: bool
We do healthcheck between all nodes and we send messages like (node_key, state)
. We rely on the fact that after a node goes down it will take at least few minutes to be replaced. We will process those messages and set the state
for each node and the structure will converge to same state.
This could differ in the case of temporarily network issues, for that when we receive a state == false
we will check again the node real state to avoid cases where an initial node down message was sent which is now obsolete as the node might be up again.
- https://github.com/lnx-search/datacake
- https://docs.rs/crdts/latest/crdts/
- https://github.com/patreu22/react-crdt
- https://github.com/automerge/automerge
- https://riak.com/index.html
Load balancer
Ideally client would interact with a load balancer cluster (again with master kr masterless) which will contain the sharding distribution logic and will serve the client requests by directly communicating with available nodes.