ds note - modrpc/info GitHub Wiki
- Distribution can hurt: network B/W and latency bottlenecks
- lots of tricks: e.g. caching, threaded serves
- Distribution can help: parallelism, pick server near client
- Idea: scalable design
- Nx servers -> Nx total performance
-
Need a way to divide the load by N
-
device the state by N
- split by user
- split by file name
- "sharding" or "partitioning"
-
device the state by N
- Rarely perfect -> only scales so far
- Global operations: e.g. search
- Load imbalance
- One very active user
- One very popular file
- one server 100%, added servers mostly idle
- Nx servers -> 1X performance
- Dropbox: ~10,000 servers: some fail
- Can I use my files if there's a failure?
- Some part of the network, some set of servers
- Maybe: replicate the data on multiple servers
- Perhaps client sends every operation to both
- Maybe only needs to wait for one reply
- Opportunity: operate from two "replicas" independenty if partitioned
- Opportunity: can 2 servers yield 2x availability AND 2x performance?
- "contract with apps/users" about meaning of operations
- e.g. "read yields most recently written value"
- hard due to partial failure, replication/caching, concurrency
-
Problem: keep replicas identical
- If one is down, it will miss operations
- Must be brought up to date after reboot
- If net is broken, *both* replicas maybe live, and see different ops
- Delete file, still visible via other replica
- "split brain" -- usually bad
- If one is down, it will miss operations
-
Problem: clients may see updates in different orders
- Due to caching or replication
- I make grades.txt unreadable, then TA writes grades to it
- What if the operations run in different order on different replicas?
- Consistency often hurts performance (communication, blocking)
- Many systems cut corners -- "relaxed consistency"
- Shifts burden to applications
-
socket: endpoint for communication
- a pair of (how about group?) processes communicating over a network employs a pair of sockets
- socket: IP address + port
- SETUP
- WLOG, there are only two machines: M0 and M1
- one-caller, one-callee
- M0 calls FOO on M1
- one-caller, two-callee
- M0 calls FOO and BAR on M1
- two-caller, one-callee
- M0 calls FOO twice (from different thread)
-
at-least-once semantics
- at-most-once semantics
-
exactly-once semantics
- at-most-once semantics + unbounded retries + fault-tolerant service