dist note - modrpc/info GitHub Wiki

Table of Contents

Lecture 2: RPC, Threads

RPC

at-least-once behavior

  • WHAT: 1, 2, ... executions
  • HOW: wait for response; if no response, re-send request N times
  • ISSUE: e.g. "deduct $10 from bank account"
    • Multiple execution is harmful
    • Could be ok if execution is ideempotent (e.g. read-only); or some application-level handling of duplicates

at-most-once behavior

  • WHAT: 0 or 1 execution
  • HOW: to avoid executing twice
    • CLIENT: Each request contains XID (unique ID) -- same XID for re-send
    • SERVER: store response for each XID-request; if same XID-request, send stored response
  • HOW: when to discard saved responses
    • CLIENT sends "seen all replies <= X (you can discard responses for <= X) " with every RPC
    • CLIENT can have at most one outstanding call at a time (no overlapping calls)

exactly-once

  • at-most-once + unbounded retires + fault-tolerent services

Lecture 3: Primary/Backup Replication

Fault Tolerance

  • we'd like a service that continues despite failures!
  • available: still useable despite [some] failures
  • correct: act just like a single server to clients
  • very hard! but very useful!

Failure Model: What will we try to cope with?

  • Independent fail-stop computer failure
    • Remus further assumes only one failure at a time
  • Site-wide power failure (and eventual reboot)
  • (Network partition)
  • No bugs, no malice

Core idea: Replication

  • Two servers (or more)
  • Each replica keeps state needed for the service
  • If one replica fails, others can continue

Big Questions re: Replication

  • What state to replicate?
  • How does replica get state?
  • When to cut over to backup?
  • Are anomalies visible at cut-over?
  • How to repair / re-integrate?

Two Main Replication Approaches

State transfer

  • "Primary" replica executes the service
  • Primary sends [new] state to backups

Replicated state machine

  • wiki
  • All replicas execute all operations
  • If same start state,
    • same operations,
    • same order,
    • deterministic,
    • then same end state

Comparison

  • State transfer is simpler but slow to transfer

Case Study: Remus

Testing Distributed Systems

Tanenbaum: Distributed Operating Systems

Ch #2: Communication in Distributed Systems

2.3.4 Blocking versus nonblocking primitives

⚠️ **GitHub.com Fallback** ⚠️