File sharding - radumarias/rfs GitHub Wiki
Each file will be split in chunks (shards) of 65MB
and we will distribute those shards on multiple nodes and also replicate them.
mod-N hashing
In terms of how we decide on which node the chunk goes initially, we can use something like chunk_index % num_nodes
, but this adds a problem when we add/remove nodes, as we would need to re-balance the cluster by moving many shards around.
Random weighted
To solve this we could use weighted random distribution like here https://dev.to/jacktt/understanding-the-weighted-random-algorithm-581p . We will use the available space for each node as weight, build those intervals and select a random interval (node) to put the shard on. This will handle gracefully adding or removing nodes without moving shards around, it will auto-rebalance in time. But this will not help with throughput.
rand crate also has something for this https://docs.rs/rand/latest/rand/distributions/struct.WeightedIndex.html#example
Consistent hashing
- https://dgryski.medium.com/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8
- https://www.youtube.com/watch?v=UF9Iqmg94tk
- https://en.wikipedia.org/wiki/Consistent_hashing
Shard keys
https://www.mongodb.com/docs/manual/core/sharding-shard-key