Concepts - radumarias/rfs GitHub Wiki
Concepts
- https://blog.bytebytego.com/p/design-a-s3-like-storage-system
- https://www.cncf.io/blog/2019/11/04/building-a-large-scale-distributed-storage-system-based-on-raft/
- https://github.com/CodisLabs/codis
- https://en.wikipedia.org/wiki/CAP_theorem
- https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
- https://www.datomic.com/
- https://suman-cshil.medium.com/paxos-introduction-74d074166fc4
- https://suman-cshil.medium.com/paxos-continued-basic-paxos-ce723cf0f970
- https://suman-cshil.medium.com/observability-is-an-afterthought-it-should-not-be-45d2e5824250
- https://medium.com/@patrickkoss/dont-build-a-distributed-system-if-you-don-t-understand-these-issues-2ae5d60bdad7
- https://github.com/spacedriveapp/spacedrive#what-is-a-vdfs
- quic have UDP flow control
- UDP DCCP(Datagram Congestion Control Protocol). The RFC 6773
- https://crates.io/crates/dccp
- checksum for each chunk
- https://github.com/tikv/raft-rs
- https://en.m.wikipedia.org/wiki/Clustered_file_system#Distributed_file_systems
- https://github.com/cholcombe973/rusix/
- https://github.com/datenlord/async-rdma
- https://crates.io/crates/rdma
- https://www.reddit.com/r/rust/s/grX2q1A9Ri
- https://man7.org/linux/man-pages/man7/inotify.7.html
- https://github.com/pedrocr/syncer
- https://capnproto.org/
- https://github.com/capnproto/capnproto-rust
- https://gitlab.com/rawler/stored-merkle-tree
- https://en.m.wikipedia.org/wiki/Conflict-free_replicated_data_type
- https://en.m.wikipedia.org/wiki/Operational_transformation
- https://www.baeldung.com/cs/distributed-systems-guide
- https://medium.com/@soulaimaneyh/exploring-the-fundamental-principles-of-distributed-systems-970c285a77b5
- https://books.google.ro/books?hl=en&lr=&id=dhaMDwAAQBAJ&oi=fnd&pg=PA25&dq=two-thirds+agreement+protocol+distributed+systems&ots=QV_mj7Jspv&sig=ybA0JFA7vFUpnN-Jxf9pSL8mCIw&redir_esc=y#v=onepage&q=two-thirds%20agreement%20protocol%20distributed%20systems&f=false
- https://en.m.wikipedia.org/wiki/Conflict-free_replicated_data_type
- https://github.com/lnx-search/datacake
- https://www.figma.com/blog/how-figmas-multiplayer-technology-works/
- https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321
- https://www.youtube.com/watch?v=5Pc18ge9ohI
Chapter 9 Volume 2 of the book system design interview
We need to organize these on topics and have a summary for each of them, like:
RAFT
https://www.cncf.io/blog/2019/11/04/building-a-large-scale-distributed-storage-system-based-on-raft/
https://github.com/tikv/raft-rs
File sharding
Each file is split in chunks (shards) and we will distribute those chunks on multiple nodes and also replicate them.
In terms of how we decide on which node the chunk goes initially I was thinking to use smth like chunk_index % num_nodes, but this adds a problem when we add/remove nodes, as we would need to rebalance the cluster by moving shards around.
To solve this we could use weighted random distribution like here https://dev.to/jacktt/understanding-the-weighted-random-algorithm-581p We will use the space used for each node as weight, build those intervals and select a random interval (node) to put the shard on. This will handle gracefully adding or removing nodes without moving shards around, it will auto-rebalance in time.
rand crate also has smth for this https://docs.rs/rand/latest/rand/distributions/struct.WeightedIndex.html#example
Alternatives
Consistent hashing
- https://en.wikipedia.org/wiki/Consistent_hashing
- https://dgryski.medium.com/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8
- https://www.youtube.com/watch?v=UF9Iqmg94tk
Metadata DB
Distributed DB
- https://surrealdb.com/features Distributed (TiKV)
- https://github.com/tikv/raft-rs
- https://ditto.live/
File sync
Decentralized BitTorrent: BitTorrent filesystem
https://www.bittorrent.com/token/btt
BTFS is a decentralized file storage system supported by millions of BitTorrent user nodes.* By running on the blockchain, which has a Delegated proof of Stake method of processing blockchain transactions, BTFS addresses these limitations.
https://en.m.wikipedia.org/wiki/Bencode
Bencode (pronounced like Bee-encode) is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.
- https://www.bittorrent.org/beps/bep_0003.html#trackers
- https://chatgpt.com/share/e8b32340-1eb7-4865-8ae7-b14a1977e56c
Raft filesystem
https://github.com/deepmehtait/Distributed-file-system-server-with-RAFT