Money Transfer - ashtishad/system-design GitHub Wiki
1. Requirements
Functional Requirements:
- Users can send money to another user or merchant in real time.
- Users can request money from another user.
- Users can view transaction history (sent, received, requested).
Non-Functional Requirements
- Consistency: No double transactions or lost funds.
- Scalability: Handle 10M DAU, 10% peak surges (e.g., holidays).
- Low Latency: Transfer <500ms, history <200ms.
- Durability: No data loss over 5 years.
- Capacity Estimation (5 years):
- DAU: 10M users.
- Transactions: 50M/day * 365 * 5 = 91.25B transactions.
- Storage: 91.25B * 500B = 45.625TB raw, ~450TB with replication/indexes.
- QPS: Avg: 50M/day ÷ 86,400s ≈ 580 QPS; Peak (10% surges): 1M * 5 tx ÷ 2,400s ≈ 2,083 QPS.
2. Core Entities
- User: {id, name, email, balance, bank_account}
- Transaction: {id, sender_id, receiver_id, amount, type (send/request), status (pending/completed/failed), idempotency_key, timestamp}
- Request: {id, requester_id, payer_id, amount, status (pending/accepted/declined), timestamp}
3. APIs
- POST /transfers/send: {receiver_id, amount, idempotency_key}, sends money.
- POST /transfers/request: {payer_id, amount}, requests money.
- POST /transfers/accept: {request_id}, accepts payment request.
- GET /transactions/{user_id}?type={send/request}&since={timestamp}: Returns transaction history.
4. High-Level Design
- Client: Mobile/web app.
- API Gateway: Routes, auth (JWT), rate-limiting.
- Microservices:
- Transfer Service: Handles sending/accepting money.
- Request Service: Manages money requests.
- History Service: Retrieves transaction history.
- Data Stores:
- PostgreSQL: Users, transactions (ACID for consistency).
- Redis: Idempotency, caching.
- Kafka: Event streaming for async processing.
- External: Stripe for bank payouts.
- Flow:
- Send → Transfer Service → Redis → PostgreSQL → Kafka.
- Request → Request Service → PostgreSQL → Kafka (notification).
- Accept → Transfer Service → PostgreSQL → Stripe.
- History → History Service → PostgreSQL.
Why PostgreSQL?
PostgreSQL ensures ACID compliance for financial transactions, critical for preventing double transfers or balance errors, with MVCC and row-level locking supporting 2K QPS peaks.
5. Deep Dives (Functional Focus)
1. Sending Money (Real-Time Transfers)
- Problem: Enable reliable, instant money transfers at 2,083 QPS peak without duplication.
- Approaches & Tradeoffs:
- Direct Transfer
- How: POST /transfers/send; BEGIN; UPDATE users SET balance = balance - amount WHERE id={sender_id}; UPDATE users SET balance = balance + amount WHERE id={receiver_id}; INSERT INTO transactions; COMMIT;
- Pros: Simple, one transaction (~100ms), ACID-safe.
- Cons: High contention on users table (~10ms/row), no retry logic, slow at peak.
- Use Case: Small-scale systems (<100 QPS).
- Idempotent Transfer
- How: Check idempotency_key in Redis, then BEGIN; SELECT balance FROM users WHERE id={sender_id} FOR UPDATE; UPDATE balance; INSERT transactions; COMMIT;
- Pros: Prevents duplicates (~5ms with Redis), scalable.
- Cons: Redis failure risks duplicates (mitigated by PostgreSQL), no async decoupling.
- Use Case: Medium-scale apps (e.g., Venmo).
- Event-Driven Transfer
- How: POST /transfers/send writes to Kafka, Transfer Service consumes, updates PostgreSQL (FOR UPDATE).
- Pros: Decouples load (~500ms), scales to 10K QPS, retryable.
- Cons: Higher latency, eventual consistency risks.
- Use Case: High-throughput systems with latency tolerance.
- Optimistic Transfer
- How: Add version to users; UPDATE users SET balance = balance - amount, version = version + 1 WHERE id={sender_id} AND version={old_version}; retry on failure.
- Pros: High concurrency (~2ms), no locks.
- Cons: Retries at peak (10-20%), complex client logic.
- Use Case: Read-heavy systems with low conflicts.
- Hybrid (Redis + PostgreSQL + Kafka)
- How: Check idempotency_key in Redis (TTL=5min), write intent to Kafka, process with SELECT FOR UPDATE in PostgreSQL, notify via Kafka.
- Pros: Real-time (<500ms), idempotent, scales to 2K QPS, durable.
- Cons: Multi-system complexity, Redis failure risks (mitigated by PostgreSQL).
- Use Case: Large-scale fintech (e.g., PayPal).
- Direct Transfer
- Industry Example (Venmo): Uses idempotency with a relational DB (e.g., PostgreSQL) and async queues for real-time transfers, ensuring no duplicates.
- Optimal Solution: Hybrid—Redis for idempotency (<1ms), Kafka for decoupling, PostgreSQL with FOR UPDATE and UNIQUE (idempotency_key) for consistency.
- Why Optimal: Balances speed (Redis: 100K ops/s), scalability (Kafka: 10K msg/s), and safety (PostgreSQL: 10 nodes, 200 QPS/node), meets <500ms latency.
- Tradeoffs: Adds infra complexity (mitigated by retries), slight latency overhead (5ms vs. 2ms for optimistic).
2. Requesting Money (Payment Requests)
- Problem: Allow users to request money, track acceptance at scale.
- Approaches & Tradeoffs:
- Inline Request
- How: POST /transfers/request; INSERT INTO requests (requester_id, payer_id, amount); notify via email/SMS.
- Pros: Simple (~50ms), no extra infra.
- Cons: No real-time tracking, manual notification, scales poorly (~100 QPS).
- Use Case: Basic P2P apps.
- Event-Driven Request
- How: Write request to PostgreSQL, publish to Kafka, notify payer via push (e.g., FCM).
- Pros: Async (~500ms), scalable (10K QPS), real-time UX.
- Cons: Notification latency (~1s), retry complexity.
- Use Case: Medium-scale systems (e.g., Cash App).
- Request with Approval
- How: INSERT INTO requests, Kafka notifies, POST /transfers/accept triggers transfer (FOR UPDATE on users).
- Pros: Explicit acceptance (~100ms), auditable, durable.
- Cons: Two-step UX, contention on accept (~10ms).
- Use Case: Business transactions.
- Optimistic Request
- How: INSERT INTO requests WHERE NOT EXISTS (SELECT ... WHERE requester_id={id} AND payer_id={id} AND status='pending'); notify via Redis pub/sub.
- Pros: High concurrency (~2ms), fast notification.
- Cons: Retries on conflict, no persistence in Redis.
- Use Case: Low-conflict systems.
- Hybrid (PostgreSQL + Kafka + Redis)
- How: Store request in PostgreSQL, publish to Kafka, cache status in Redis, notify via WebSockets/FCM.
- Pros: Real-time (<500ms), durable, scales to 2K QPS.
- Cons: Multi-system sync, Redis volatility (mitigated by PostgreSQL).
- Use Case: WhatsApp Pay-like systems.
- Inline Request
- Industry Example (PayPal): Uses a request-approval flow with queues (e.g., Kafka) and push notifications for real-time UX.
- Optimal Solution: Hybrid—PostgreSQL for request storage, Kafka for notifications, Redis for status caching, WebSockets for real-time updates.
- Why Optimal: Ensures durability (PostgreSQL), scalability (Kafka: 10K msg/s), and UX (WebSockets: <500ms), handles 2K QPS.
- Tradeoffs: Adds complexity (mitigated by retries), notification latency tolerable (~1s).
3. Transaction History (Viewing Records)
- Problem: Provide fast, accurate transaction history at 2,083 QPS.
- Approaches & Tradeoffs:
- Simple SQL Fetch
- How: GET /transactions/{user_id}; SELECT * FROM transactions WHERE user_id={user_id} ORDER BY timestamp DESC LIMIT 50;
- Pros: Simple (~100ms), no extra infra.
- Cons: Slow at scale (O(n) scan), ~1s for 91B rows.
- Use Case: Tiny datasets (<1M tx).
- Indexed SQL
- How: Add index on user_id, timestamp; fetch from PostgreSQL.
- Pros: Faster (~50ms), durable.
- Cons: Index overhead (~20% write penalty), scales to ~1K QPS.
- Use Case: Medium-scale apps.
- Event Sourcing
- How: Kafka stores all tx events, rebuild history on demand.
- Pros: Durable, scales to 10K QPS, auditable.
- Cons: High latency (~1s), complex rebuild (~minutes).
- Use Case: Analytics-heavy systems.
- Caching (Redis)
- How: Cache last 50 tx per user in Redis (user_id:history), sync from PostgreSQL.
- Pros: Ultra-fast (<1ms), 2K QPS sustainable.
- Cons: Cache staleness, sync complexity.
- Use Case: Real-time history (e.g., Venmo).
- Hybrid (PostgreSQL + Redis)
- How: PostgreSQL for full history, Redis for recent tx (TTL=1h), sync via Kafka.
- Pros: Fast (<10ms with cache), durable, scales to 2K QPS.
- Cons: Dual-system sync, Redis failure risks (mitigated by PostgreSQL).
- Use Case: High-scale fintech.
- Simple SQL Fetch
- Industry Example (Venmo): Uses a relational DB with in-memory caching (e.g., Redis) for recent transactions, ensuring fast history access.
- Optimal Solution: Hybrid—PostgreSQL for persistent history, Redis for recent tx caching (<200ms), Kafka for sync.
- Why Optimal: Meets <200ms latency, handles 2K QPS (Redis: 100K ops/s), durable with PostgreSQL (10 nodes, 200 QPS/node).
- Tradeoffs: Adds Redis/Kafka infra (mitigated by CDC), cache staleness acceptable for UX.
Summary of Solutions and Industry Practices
- Authentication: PostgreSQL + JWT – PayPal’s secure login.
- Sending Money: Redis + PostgreSQL + Kafka – Venmo’s real-time transfers.
- Requesting Money: PostgreSQL + Kafka + WebSockets – PayPal’s request flow.
- Transaction History: PostgreSQL + Redis – Cash App’s fast records.
- Consistency: PostgreSQL (ACID) – Zelle’s double-payment prevention.
- Scalability: Kafka + Redis – TransferWise’s surge handling.