File sharding - radumarias/rfs GitHub Wiki

Each file will be split in chunks (shards) of 65MB and we will distribute those shards on multiple nodes and also replicate them.

mod-N hashing

In terms of how we decide on which node the chunk goes initially, we can use something like chunk_index % num_nodes, but this adds a problem when we add/remove nodes, as we would need to re-balance the cluster by moving many shards around.

Random weighted

To solve this we could use weighted random distribution like here https://dev.to/jacktt/understanding-the-weighted-random-algorithm-581p . We will use the available space for each node as weight, build those intervals and select a random interval (node) to put the shard on. This will handle gracefully adding or removing nodes without moving shards around, it will auto-rebalance in time. But this will not help with throughput.

rand crate also has something for this https://docs.rs/rand/latest/rand/distributions/struct.WeightedIndex.html#example

Consistent hashing

Shard keys

https://www.mongodb.com/docs/manual/core/sharding-shard-key

A powerful hash function

BLAKE3

Alternatives