Money Transfer - ashtishad/system-design GitHub Wiki

1. Requirements

Functional Requirements:

  • Users can send money to another user or merchant in real time.
  • Users can request money from another user.
  • Users can view transaction history (sent, received, requested).

Non-Functional Requirements

  • Consistency: No double transactions or lost funds.
  • Scalability: Handle 10M DAU, 10% peak surges (e.g., holidays).
  • Low Latency: Transfer <500ms, history <200ms.
  • Durability: No data loss over 5 years.
  • Capacity Estimation (5 years):
    • DAU: 10M users.
    • Transactions: 50M/day * 365 * 5 = 91.25B transactions.
    • Storage: 91.25B * 500B = 45.625TB raw, ~450TB with replication/indexes.
    • QPS: Avg: 50M/day ÷ 86,400s ≈ 580 QPS; Peak (10% surges): 1M * 5 tx ÷ 2,400s ≈ 2,083 QPS.

2. Core Entities

  • User: {id, name, email, balance, bank_account}
  • Transaction: {id, sender_id, receiver_id, amount, type (send/request), status (pending/completed/failed), idempotency_key, timestamp}
  • Request: {id, requester_id, payer_id, amount, status (pending/accepted/declined), timestamp}

3. APIs

  • POST /transfers/send: {receiver_id, amount, idempotency_key}, sends money.
  • POST /transfers/request: {payer_id, amount}, requests money.
  • POST /transfers/accept: {request_id}, accepts payment request.
  • GET /transactions/{user_id}?type={send/request}&since={timestamp}: Returns transaction history.

4. High-Level Design

  • Client: Mobile/web app.
  • API Gateway: Routes, auth (JWT), rate-limiting.
  • Microservices:
    • Transfer Service: Handles sending/accepting money.
    • Request Service: Manages money requests.
    • History Service: Retrieves transaction history.
  • Data Stores:
    • PostgreSQL: Users, transactions (ACID for consistency).
    • Redis: Idempotency, caching.
    • Kafka: Event streaming for async processing.
  • External: Stripe for bank payouts.
  • Flow:
    • Send → Transfer Service → Redis → PostgreSQL → Kafka.
    • Request → Request Service → PostgreSQL → Kafka (notification).
    • Accept → Transfer Service → PostgreSQL → Stripe.
    • History → History Service → PostgreSQL.

Why PostgreSQL?

PostgreSQL ensures ACID compliance for financial transactions, critical for preventing double transfers or balance errors, with MVCC and row-level locking supporting 2K QPS peaks.


5. Deep Dives (Functional Focus)

1. Sending Money (Real-Time Transfers)

  • Problem: Enable reliable, instant money transfers at 2,083 QPS peak without duplication.
  • Approaches & Tradeoffs:
    • Direct Transfer
      • How: POST /transfers/send; BEGIN; UPDATE users SET balance = balance - amount WHERE id={sender_id}; UPDATE users SET balance = balance + amount WHERE id={receiver_id}; INSERT INTO transactions; COMMIT;
      • Pros: Simple, one transaction (~100ms), ACID-safe.
      • Cons: High contention on users table (~10ms/row), no retry logic, slow at peak.
      • Use Case: Small-scale systems (<100 QPS).
    • Idempotent Transfer
      • How: Check idempotency_key in Redis, then BEGIN; SELECT balance FROM users WHERE id={sender_id} FOR UPDATE; UPDATE balance; INSERT transactions; COMMIT;
      • Pros: Prevents duplicates (~5ms with Redis), scalable.
      • Cons: Redis failure risks duplicates (mitigated by PostgreSQL), no async decoupling.
      • Use Case: Medium-scale apps (e.g., Venmo).
    • Event-Driven Transfer
      • How: POST /transfers/send writes to Kafka, Transfer Service consumes, updates PostgreSQL (FOR UPDATE).
      • Pros: Decouples load (~500ms), scales to 10K QPS, retryable.
      • Cons: Higher latency, eventual consistency risks.
      • Use Case: High-throughput systems with latency tolerance.
    • Optimistic Transfer
      • How: Add version to users; UPDATE users SET balance = balance - amount, version = version + 1 WHERE id={sender_id} AND version={old_version}; retry on failure.
      • Pros: High concurrency (~2ms), no locks.
      • Cons: Retries at peak (10-20%), complex client logic.
      • Use Case: Read-heavy systems with low conflicts.
    • Hybrid (Redis + PostgreSQL + Kafka)
      • How: Check idempotency_key in Redis (TTL=5min), write intent to Kafka, process with SELECT FOR UPDATE in PostgreSQL, notify via Kafka.
      • Pros: Real-time (<500ms), idempotent, scales to 2K QPS, durable.
      • Cons: Multi-system complexity, Redis failure risks (mitigated by PostgreSQL).
      • Use Case: Large-scale fintech (e.g., PayPal).
  • Industry Example (Venmo): Uses idempotency with a relational DB (e.g., PostgreSQL) and async queues for real-time transfers, ensuring no duplicates.
  • Optimal Solution: Hybrid—Redis for idempotency (<1ms), Kafka for decoupling, PostgreSQL with FOR UPDATE and UNIQUE (idempotency_key) for consistency.
  • Why Optimal: Balances speed (Redis: 100K ops/s), scalability (Kafka: 10K msg/s), and safety (PostgreSQL: 10 nodes, 200 QPS/node), meets <500ms latency.
  • Tradeoffs: Adds infra complexity (mitigated by retries), slight latency overhead (5ms vs. 2ms for optimistic).

2. Requesting Money (Payment Requests)

  • Problem: Allow users to request money, track acceptance at scale.
  • Approaches & Tradeoffs:
    • Inline Request
      • How: POST /transfers/request; INSERT INTO requests (requester_id, payer_id, amount); notify via email/SMS.
      • Pros: Simple (~50ms), no extra infra.
      • Cons: No real-time tracking, manual notification, scales poorly (~100 QPS).
      • Use Case: Basic P2P apps.
    • Event-Driven Request
      • How: Write request to PostgreSQL, publish to Kafka, notify payer via push (e.g., FCM).
      • Pros: Async (~500ms), scalable (10K QPS), real-time UX.
      • Cons: Notification latency (~1s), retry complexity.
      • Use Case: Medium-scale systems (e.g., Cash App).
    • Request with Approval
      • How: INSERT INTO requests, Kafka notifies, POST /transfers/accept triggers transfer (FOR UPDATE on users).
      • Pros: Explicit acceptance (~100ms), auditable, durable.
      • Cons: Two-step UX, contention on accept (~10ms).
      • Use Case: Business transactions.
    • Optimistic Request
      • How: INSERT INTO requests WHERE NOT EXISTS (SELECT ... WHERE requester_id={id} AND payer_id={id} AND status='pending'); notify via Redis pub/sub.
      • Pros: High concurrency (~2ms), fast notification.
      • Cons: Retries on conflict, no persistence in Redis.
      • Use Case: Low-conflict systems.
    • Hybrid (PostgreSQL + Kafka + Redis)
      • How: Store request in PostgreSQL, publish to Kafka, cache status in Redis, notify via WebSockets/FCM.
      • Pros: Real-time (<500ms), durable, scales to 2K QPS.
      • Cons: Multi-system sync, Redis volatility (mitigated by PostgreSQL).
      • Use Case: WhatsApp Pay-like systems.
  • Industry Example (PayPal): Uses a request-approval flow with queues (e.g., Kafka) and push notifications for real-time UX.
  • Optimal Solution: Hybrid—PostgreSQL for request storage, Kafka for notifications, Redis for status caching, WebSockets for real-time updates.
  • Why Optimal: Ensures durability (PostgreSQL), scalability (Kafka: 10K msg/s), and UX (WebSockets: <500ms), handles 2K QPS.
  • Tradeoffs: Adds complexity (mitigated by retries), notification latency tolerable (~1s).

3. Transaction History (Viewing Records)

  • Problem: Provide fast, accurate transaction history at 2,083 QPS.
  • Approaches & Tradeoffs:
    • Simple SQL Fetch
      • How: GET /transactions/{user_id}; SELECT * FROM transactions WHERE user_id={user_id} ORDER BY timestamp DESC LIMIT 50;
      • Pros: Simple (~100ms), no extra infra.
      • Cons: Slow at scale (O(n) scan), ~1s for 91B rows.
      • Use Case: Tiny datasets (<1M tx).
    • Indexed SQL
      • How: Add index on user_id, timestamp; fetch from PostgreSQL.
      • Pros: Faster (~50ms), durable.
      • Cons: Index overhead (~20% write penalty), scales to ~1K QPS.
      • Use Case: Medium-scale apps.
    • Event Sourcing
      • How: Kafka stores all tx events, rebuild history on demand.
      • Pros: Durable, scales to 10K QPS, auditable.
      • Cons: High latency (~1s), complex rebuild (~minutes).
      • Use Case: Analytics-heavy systems.
    • Caching (Redis)
      • How: Cache last 50 tx per user in Redis (user_id:history), sync from PostgreSQL.
      • Pros: Ultra-fast (<1ms), 2K QPS sustainable.
      • Cons: Cache staleness, sync complexity.
      • Use Case: Real-time history (e.g., Venmo).
    • Hybrid (PostgreSQL + Redis)
      • How: PostgreSQL for full history, Redis for recent tx (TTL=1h), sync via Kafka.
      • Pros: Fast (<10ms with cache), durable, scales to 2K QPS.
      • Cons: Dual-system sync, Redis failure risks (mitigated by PostgreSQL).
      • Use Case: High-scale fintech.
  • Industry Example (Venmo): Uses a relational DB with in-memory caching (e.g., Redis) for recent transactions, ensuring fast history access.
  • Optimal Solution: Hybrid—PostgreSQL for persistent history, Redis for recent tx caching (<200ms), Kafka for sync.
  • Why Optimal: Meets <200ms latency, handles 2K QPS (Redis: 100K ops/s), durable with PostgreSQL (10 nodes, 200 QPS/node).
  • Tradeoffs: Adds Redis/Kafka infra (mitigated by CDC), cache staleness acceptable for UX.

Summary of Solutions and Industry Practices

  • Authentication: PostgreSQL + JWT – PayPal’s secure login.
  • Sending Money: Redis + PostgreSQL + Kafka – Venmo’s real-time transfers.
  • Requesting Money: PostgreSQL + Kafka + WebSockets – PayPal’s request flow.
  • Transaction History: PostgreSQL + Redis – Cash App’s fast records.
  • Consistency: PostgreSQL (ACID) – Zelle’s double-payment prevention.
  • Scalability: Kafka + Redis – TransferWise’s surge handling.