Table of Contents Topics Performance Fault Tolerance Consistency Distributed Communication (Silberschatz Ch.15) Sockets RPC RPC semantics

Topics

Performance

Distribution can hurt: network B/W and latency bottlenecks
- lots of tricks: e.g. caching, threaded serves
Distribution can help: parallelism, pick server near client
Idea: scalable design
- Nx servers -> Nx total performance
Need a way to divide the load by N
- device the state by N
  - split by user
  - split by file name
  - "sharding" or "partitioning"
Rarely perfect -> only scales so far
- Global operations: e.g. search
- Load imbalance
  - One very active user
  - One very popular file
    - one server 100%, added servers mostly idle
    - Nx servers -> 1X performance

Dropbox: ~10,000 servers: some fail
Can I use my files if there's a failure?
- Some part of the network, some set of servers
Maybe: replicate the data on multiple servers
- Perhaps client sends every operation to both
- Maybe only needs to wait for one reply
Opportunity: operate from two "replicas" independenty if partitioned
Opportunity: can 2 servers yield 2x availability AND 2x performance?

"contract with apps/users" about meaning of operations
- e.g. "read yields most recently written value"
- hard due to partial failure, replication/caching, concurrency
Problem: keep replicas identical
- If one is down, it will miss operations
  - Must be brought up to date after reboot
- If net is broken, *both* replicas maybe live, and see different ops
  - Delete file, still visible via other replica
  - "split brain" -- usually bad
Problem: clients may see updates in different orders
- Due to caching or replication
- I make grades.txt unreadable, then TA writes grades to it
- What if the operations run in different order on different replicas?
Consistency often hurts performance (communication, blocking)
- Many systems cut corners -- "relaxed consistency"
- Shifts burden to applications

socket: endpoint for communication
- a pair of (how about group?) processes communicating over a network employs a pair of sockets
- socket: IP address + port

SETUP
- WLOG, there are only two machines: M0 and M1
- one-caller, one-callee
  - M0 calls FOO on M1
- one-caller, two-callee
  - M0 calls FOO and BAR on M1
- two-caller, one-callee
  - M0 calls FOO twice (from different thread)
at-least-once semantics
at-most-once semantics
exactly-once semantics
- at-most-once semantics + unbounded retries + fault-tolerant service