Money Transfer - ashtishad/system-design GitHub Wiki

1. Requirements

Functional Requirements:

Users can send money to another user or merchant in real time.
Users can request money from another user.
Users can view transaction history (sent, received, requested).

Non-Functional Requirements

Consistency: No double transactions or lost funds.
Scalability: Handle 10M DAU, 10% peak surges (e.g., holidays).
Low Latency: Transfer <500ms, history <200ms.
Durability: No data loss over 5 years.
Capacity Estimation (5 years):
- DAU: 10M users.
- Transactions: 50M/day * 365 * 5 = 91.25B transactions.
- Storage: 91.25B * 500B = 45.625TB raw, ~450TB with replication/indexes.
- QPS: Avg: 50M/day ÷ 86,400s ≈ 580 QPS; Peak (10% surges): 1M * 5 tx ÷ 2,400s ≈ 2,083 QPS.

2. Core Entities

User: {id, name, email, balance, bank_account}
Transaction: {id, sender_id, receiver_id, amount, type (send/request), status (pending/completed/failed), idempotency_key, timestamp}
Request: {id, requester_id, payer_id, amount, status (pending/accepted/declined), timestamp}

3. APIs

POST /transfers/send: {receiver_id, amount, idempotency_key}, sends money.
POST /transfers/request: {payer_id, amount}, requests money.
POST /transfers/accept: {request_id}, accepts payment request.
GET /transactions/{user_id}?type={send/request}&since={timestamp}: Returns transaction history.

4. High-Level Design

Client: Mobile/web app.
API Gateway: Routes, auth (JWT), rate-limiting.
Microservices:
- Transfer Service: Handles sending/accepting money.
- Request Service: Manages money requests.
- History Service: Retrieves transaction history.
Data Stores:
- PostgreSQL: Users, transactions (ACID for consistency).
- Redis: Idempotency, caching.
- Kafka: Event streaming for async processing.
External: Stripe for bank payouts.
Flow:
- Send → Transfer Service → Redis → PostgreSQL → Kafka.
- Request → Request Service → PostgreSQL → Kafka (notification).
- Accept → Transfer Service → PostgreSQL → Stripe.
- History → History Service → PostgreSQL.

Why PostgreSQL?

PostgreSQL ensures ACID compliance for financial transactions, critical for preventing double transfers or balance errors, with MVCC and row-level locking supporting 2K QPS peaks.

5. Deep Dives (Functional Focus)

1. Sending Money (Real-Time Transfers)

Problem: Enable reliable, instant money transfers at 2,083 QPS peak without duplication.
Approaches & Tradeoffs:
- Direct Transfer
  - How: POST /transfers/send; BEGIN; UPDATE users SET balance = balance - amount WHERE id={sender_id}; UPDATE users SET balance = balance + amount WHERE id={receiver_id}; INSERT INTO transactions; COMMIT;
  - Pros: Simple, one transaction (~100ms), ACID-safe.
  - Cons: High contention on users table (~10ms/row), no retry logic, slow at peak.
  - Use Case: Small-scale systems (<100 QPS).
- Idempotent Transfer
  - How: Check idempotency_key in Redis, then BEGIN; SELECT balance FROM users WHERE id={sender_id} FOR UPDATE; UPDATE balance; INSERT transactions; COMMIT;
  - Pros: Prevents duplicates (~5ms with Redis), scalable.
  - Cons: Redis failure risks duplicates (mitigated by PostgreSQL), no async decoupling.
  - Use Case: Medium-scale apps (e.g., Venmo).
- Event-Driven Transfer
  - How: POST /transfers/send writes to Kafka, Transfer Service consumes, updates PostgreSQL (FOR UPDATE).
  - Pros: Decouples load (~500ms), scales to 10K QPS, retryable.
  - Cons: Higher latency, eventual consistency risks.
  - Use Case: High-throughput systems with latency tolerance.
- Optimistic Transfer
  - How: Add version to users; UPDATE users SET balance = balance - amount, version = version + 1 WHERE id={sender_id} AND version={old_version}; retry on failure.
  - Pros: High concurrency (~2ms), no locks.
  - Cons: Retries at peak (10-20%), complex client logic.
  - Use Case: Read-heavy systems with low conflicts.
- Hybrid (Redis + PostgreSQL + Kafka)
  - How: Check idempotency_key in Redis (TTL=5min), write intent to Kafka, process with SELECT FOR UPDATE in PostgreSQL, notify via Kafka.
  - Pros: Real-time (<500ms), idempotent, scales to 2K QPS, durable.
  - Cons: Multi-system complexity, Redis failure risks (mitigated by PostgreSQL).
  - Use Case: Large-scale fintech (e.g., PayPal).
Industry Example (Venmo): Uses idempotency with a relational DB (e.g., PostgreSQL) and async queues for real-time transfers, ensuring no duplicates.
Optimal Solution: Hybrid—Redis for idempotency (<1ms), Kafka for decoupling, PostgreSQL with FOR UPDATE and UNIQUE (idempotency_key) for consistency.
Why Optimal: Balances speed (Redis: 100K ops/s), scalability (Kafka: 10K msg/s), and safety (PostgreSQL: 10 nodes, 200 QPS/node), meets <500ms latency.
Tradeoffs: Adds infra complexity (mitigated by retries), slight latency overhead (5ms vs. 2ms for optimistic).

2. Requesting Money (Payment Requests)

Problem: Allow users to request money, track acceptance at scale.
Approaches & Tradeoffs:
- Inline Request
  - How: POST /transfers/request; INSERT INTO requests (requester_id, payer_id, amount); notify via email/SMS.
  - Pros: Simple (~50ms), no extra infra.
  - Cons: No real-time tracking, manual notification, scales poorly (~100 QPS).
  - Use Case: Basic P2P apps.
- Event-Driven Request
  - How: Write request to PostgreSQL, publish to Kafka, notify payer via push (e.g., FCM).
  - Pros: Async (~500ms), scalable (10K QPS), real-time UX.
  - Cons: Notification latency (~1s), retry complexity.
  - Use Case: Medium-scale systems (e.g., Cash App).
- Request with Approval
  - How: INSERT INTO requests, Kafka notifies, POST /transfers/accept triggers transfer (FOR UPDATE on users).
  - Pros: Explicit acceptance (~100ms), auditable, durable.
  - Cons: Two-step UX, contention on accept (~10ms).
  - Use Case: Business transactions.
- Optimistic Request
  - How: INSERT INTO requests WHERE NOT EXISTS (SELECT ... WHERE requester_id={id} AND payer_id={id} AND status='pending'); notify via Redis pub/sub.
  - Pros: High concurrency (~2ms), fast notification.
  - Cons: Retries on conflict, no persistence in Redis.
  - Use Case: Low-conflict systems.
- Hybrid (PostgreSQL + Kafka + Redis)
  - How: Store request in PostgreSQL, publish to Kafka, cache status in Redis, notify via WebSockets/FCM.
  - Pros: Real-time (<500ms), durable, scales to 2K QPS.
  - Cons: Multi-system sync, Redis volatility (mitigated by PostgreSQL).
  - Use Case: WhatsApp Pay-like systems.
Industry Example (PayPal): Uses a request-approval flow with queues (e.g., Kafka) and push notifications for real-time UX.
Optimal Solution: Hybrid—PostgreSQL for request storage, Kafka for notifications, Redis for status caching, WebSockets for real-time updates.
Why Optimal: Ensures durability (PostgreSQL), scalability (Kafka: 10K msg/s), and UX (WebSockets: <500ms), handles 2K QPS.
Tradeoffs: Adds complexity (mitigated by retries), notification latency tolerable (~1s).

3. Transaction History (Viewing Records)

Problem: Provide fast, accurate transaction history at 2,083 QPS.
Approaches & Tradeoffs:
- Simple SQL Fetch
  - How: GET /transactions/{user_id}; SELECT * FROM transactions WHERE user_id={user_id} ORDER BY timestamp DESC LIMIT 50;
  - Pros: Simple (~100ms), no extra infra.
  - Cons: Slow at scale (O(n) scan), ~1s for 91B rows.
  - Use Case: Tiny datasets (<1M tx).
- Indexed SQL
  - How: Add index on user_id, timestamp; fetch from PostgreSQL.
  - Pros: Faster (~50ms), durable.
  - Cons: Index overhead (~20% write penalty), scales to ~1K QPS.
  - Use Case: Medium-scale apps.
- Event Sourcing
  - How: Kafka stores all tx events, rebuild history on demand.
  - Pros: Durable, scales to 10K QPS, auditable.
  - Cons: High latency (~1s), complex rebuild (~minutes).
  - Use Case: Analytics-heavy systems.
- Caching (Redis)
  - How: Cache last 50 tx per user in Redis (user_id:history), sync from PostgreSQL.
  - Pros: Ultra-fast (<1ms), 2K QPS sustainable.
  - Cons: Cache staleness, sync complexity.
  - Use Case: Real-time history (e.g., Venmo).
- Hybrid (PostgreSQL + Redis)
  - How: PostgreSQL for full history, Redis for recent tx (TTL=1h), sync via Kafka.
  - Pros: Fast (<10ms with cache), durable, scales to 2K QPS.
  - Cons: Dual-system sync, Redis failure risks (mitigated by PostgreSQL).
  - Use Case: High-scale fintech.
Industry Example (Venmo): Uses a relational DB with in-memory caching (e.g., Redis) for recent transactions, ensuring fast history access.
Optimal Solution: Hybrid—PostgreSQL for persistent history, Redis for recent tx caching (<200ms), Kafka for sync.
Why Optimal: Meets <200ms latency, handles 2K QPS (Redis: 100K ops/s), durable with PostgreSQL (10 nodes, 200 QPS/node).
Tradeoffs: Adds Redis/Kafka infra (mitigated by CDC), cache staleness acceptable for UX.

Summary of Solutions and Industry Practices

Authentication: PostgreSQL + JWT – PayPal’s secure login.
Sending Money: Redis + PostgreSQL + Kafka – Venmo’s real-time transfers.
Requesting Money: PostgreSQL + Kafka + WebSockets – PayPal’s request flow.
Transaction History: PostgreSQL + Redis – Cash App’s fast records.
Consistency: PostgreSQL (ACID) – Zelle’s double-payment prevention.
Scalability: Kafka + Redis – TransferWise’s surge handling.