Hotel Reservation - ashtishad/system-design GitHub Wiki

1. Requirements

Functional Requirements:

  • Users can search for hotels/availability.
  • Users can book a reservation (reserve + confirm).
  • Users can view booking history.

Non-Functional Requirements:

  • Concurrency: No overbooking of rooms at 2,083 QPS peak.
  • Scalability: Handle surges (10% peak events, e.g., holidays).
  • Low Latency: Search <200ms, booking <500ms.
  • Availability: 99.99% uptime.
  • Capacity Estimation (5 years):
    • DAU: 10M users.
    • Bookings: 500M/year * 5 = 2.5B bookings.
    • Storage:
      • Hotels/Rooms: 100K hotels * 100 rooms * 500B = 5GB.
      • Bookings: 2.5B * 1KB = 2.5TB.
      • Total: ~3TB raw, ~30TB with replication/indexes.
    • QPS:
      • Avg: 500M/year ÷ 31.5M sec ≈ 16 QPS.
      • Peak (10% surges): 1M users * 5 bookings ÷ 2,400s (busy hour) ≈ 2,083 QPS.

2. Core Entities

  • Hotel: {id, name, location, room_types[]}
  • RoomType: {id, hotel_id, type, total_count, price}
  • Room: {id, hotel_id, room_type_id, room_number, status (available/booked)} (optional).
  • Booking: {id, user_id, hotel_id, room_type_id, room_id?, check_in, check_out, status}

3. APIs

  • GET /hotels/search?location={loc}&dates={check_in}-{check_out}: Returns available hotels/room types.
  • POST /bookings/reserve: {hotel_id, room_type_id, dates}, reserves a room.
  • POST /bookings/confirm: {booking_id, payment_method}, confirms booking.
  • GET /bookings/{user_id}: Returns booking history.

4. High-Level Design

  • Client: Web/mobile app.
  • API Gateway: Routes, auth (JWT), rate-limiting.
  • Microservices:
    • Search Service: Queries availability.
    • Booking Service: Manages reservations, concurrency.
    • History Service: Retrieves booking history.
  • Data Stores:
    • PostgreSQL: Hotels, rooms, bookings (ACID for consistency).
    • Redis: Availability cache, idempotency.
  • Flow:
    • Search → Search Service → PostgreSQL/Redis.
    • Reserve → Booking Service → PostgreSQL.
    • Confirm → Booking Service → Stripe → PostgreSQL.

Why PostgreSQL?

PostgreSQL ensures ACID compliance, critical for preventing overbooking. Its row-level locking and MVCC handle concurrent reservations at 2,083 QPS effectively.


5. Deep Dives

1. Concurrency (Prevent Overbooking)

  • Problem: Ensure no room/room type is overbooked at 2,083 QPS peak.
  • Approaches & Tradeoffs:
    • Isolation Level: Serializable
      • How: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT room_type.total_count - COUNT(bookings) ... INSERT;
      • Pros: Prevents all anomalies, guarantees no overbooking.
      • Cons: High contention, ~1K QPS max, frequent aborts.
    • Isolation Level: Read Committed
      • How: BEGIN; SELECT total_count FROM room_types WHERE id = {room_type_id} FOR UPDATE; INSERT booking; COMMIT;
      • Pros: Balances concurrency (~5ms), widely supported.
      • Cons: Requires explicit locking, risks dirty reads.
    • Pessimistic Locking
      • How: SELECT * FROM room_types WHERE id = {room_type_id} FOR UPDATE NOWAIT; then check count, insert.
      • Pros: Immediate rejection, ACID-safe.
      • Cons: Lock contention (~10ms), scales poorly at peak.
      • OSSOptimistic Locking**
        • How: Add version to room_types; UPDATE room_types SET available_count = available_count - 1 WHERE id = {room_type_id} AND version = {old_version};
        • Pros: High concurrency (~2ms), no locks.
        • Cons: Retries on conflict (5-10% at peak), complex logic.
    • Database Constraints
      • How: CHECK (available_count >= 0) on room_types, decrement on insert.
      • Pros: Simple, database-enforced.
      • Cons: Write penalty (~20%), no retry logic.
    • Hybrid (Redis + PostgreSQL)
      • How: Reserve in Redis (DECR room_type_id:available, TTL=10min), confirm in PostgreSQL with FOR UPDATE.
      • Pros: Redis (<1ms) scales to 100K QPS, PostgreSQL ensures durability.
      • Cons: Redis failure risks temporary overbooking (mitigated by PostgreSQL).
  • Industry Example (Booking.com): Uses a hybrid approach—caching availability in memory (e.g., Redis) for speed, with a relational DB (e.g., PostgreSQL) for consistency.
  • Optimal Solution: Hybrid—Redis for fast availability checks (2K QPS, <1ms), PostgreSQL with FOR UPDATE and CHECK constraint for final safety. Handles 2,083 QPS with <5ms latency.
  • Tech Details: Redis: 100K ops/s, PostgreSQL: 10 nodes, 200 QPS/node.

2. Room Number vs. Room Type Assignment

  • Option 1: Assign Room Number During Booking
    • How: SELECT room_id FROM rooms WHERE hotel_id = {hotel_id} AND room_type_id = {room_type_id} AND status = 'available' FOR UPDATE; UPDATE rooms SET status = 'booked'; INSERT booking;
    • Pros: Precise allocation, guest knows room immediately.
    • Cons: High contention (locks per room), complex inventory (e.g., 100 rooms = 100 rows locked), slower (~10ms/room).
    • Concurrency Impact: At 2K QPS, locking individual rooms risks bottlenecks (e.g., 100 rooms/hotel = 20 QPS/hotel).
  • Option 2: Assign Room Type, Room Number at Check-In (Typical Hotels)
    • How: SELECT total_count FROM room_types WHERE id = {room_type_id} FOR UPDATE; UPDATE available_count - 1; INSERT booking; Room assigned later via UPDATE booking SET room_id = {room_id};
    • Pros: Simpler concurrency (locks per type, not room), faster (~5ms), flexible inventory management.
    • Cons: Room assignment delayed, potential guest dissatisfaction.
    • Concurrency Impact: Scales better—e.g., 10 room types/hotel = 200 QPS/type at peak.
  • Chosen: Room Type—Reduces contention, aligns with industry (e.g., Marriott), manageable at 2K QPS. Room number assignment at check-in uses a separate low-QPS process.
  • Tradeoffs: Room Type is optimal for scalability; Room Number suits luxury hotels but sacrifices throughput.

3. Scalability for Surges

  • Problem: 2,083 QPS during holidays.
  • Solution: Load balancers, PostgreSQL replicas, Kafka for async confirmations.
  • Tech Details: Replicas: 2K QPS, Kafka: 10K msg/s.

4. Low-Latency Search

  • Problem: <200ms at peak.
  • Solution: Redis cache for availability, Elasticsearch for hotel search.
  • Industry Example (Expedia): Caches room availability in memory.
  • Tech Details: Redis: <1ms, Elasticsearch: 50ms.

5. High Availability

  • Problem: 99.99% uptime.
  • Solution: Multi-region PostgreSQL, SQS retries.
  • Tech Details: Failover: 5s, SQS: 1K retries/s.

System Design Diagram

text

CollapseWrapCopy

+----------------+ +----------------+ | Client |<------->| API Gateway | | (Web/Mobile) | | (JWT, Routing) | +----------------+ +----------------+ | | | +----------------+ +-----+ +-----+-------+ | Search Service |<---+ | | | (Availability) | | | +----------------+ | | +----------------+ +-----+ +-----+-------+ | Booking Service |<--------| | | | | (Reservations) | | | | | +----------------+ | | | | +----------------+ | | +----------+ | | History Service |<--------| | | | | (Booking History)| | | | | +----------------+ +-----+ | | | | | +----------------+ +----------------+ +----------------+ | Redis |<------->| PostgreSQL | | Kafka | | (Cache, Avail.)| | (Rooms, Bookings)| | (Confirm Queue)| +----------------+ +----------------+ +----------------+ | +----------------+ +----------------+ | Elasticsearch |<------->| Stripe | | (Hotel Search) | | (Payments) | +----------------+ +----------------+


Summary of Solutions and Industry Practices

  • Authentication: PostgreSQL + JWT + Redis – Booking.com’s secure login.
  • Concurrency: PostgreSQL (ACID) + Redis – Expedia’s overbooking prevention.
  • Scalability: Kafka + PostgreSQL Replicas – Marriott’s surge handling.
  • Low Latency: Redis + Elasticsearch – Expedia’s fast search.
  • Room Assignment: Room Type + PostgreSQL – Hilton’s flexible inventory.
  • Availability: Multi-Region PostgreSQL – Airbnb’s reliable uptime.
⚠️ **GitHub.com Fallback** ⚠️