Hotel Reservation - ashtishad/system-design GitHub Wiki
Functional Requirements:
- Users can search for hotels/availability.
- Users can book a reservation (reserve + confirm).
- Users can view booking history.
Non-Functional Requirements:
- Concurrency: No overbooking of rooms at 2,083 QPS peak.
- Scalability: Handle surges (10% peak events, e.g., holidays).
- Low Latency: Search <200ms, booking <500ms.
- Availability: 99.99% uptime.
-
Capacity Estimation (5 years):
- DAU: 10M users.
- Bookings: 500M/year * 5 = 2.5B bookings.
-
Storage:
- Hotels/Rooms: 100K hotels * 100 rooms * 500B = 5GB.
- Bookings: 2.5B * 1KB = 2.5TB.
- Total: ~3TB raw, ~30TB with replication/indexes.
-
QPS:
- Avg: 500M/year ÷ 31.5M sec ≈ 16 QPS.
- Peak (10% surges): 1M users * 5 bookings ÷ 2,400s (busy hour) ≈ 2,083 QPS.
- Hotel: {id, name, location, room_types[]}
- RoomType: {id, hotel_id, type, total_count, price}
- Room: {id, hotel_id, room_type_id, room_number, status (available/booked)} (optional).
- Booking: {id, user_id, hotel_id, room_type_id, room_id?, check_in, check_out, status}
- GET /hotels/search?location={loc}&dates={check_in}-{check_out}: Returns available hotels/room types.
- POST /bookings/reserve: {hotel_id, room_type_id, dates}, reserves a room.
- POST /bookings/confirm: {booking_id, payment_method}, confirms booking.
- GET /bookings/{user_id}: Returns booking history.
- Client: Web/mobile app.
- API Gateway: Routes, auth (JWT), rate-limiting.
-
Microservices:
- Search Service: Queries availability.
- Booking Service: Manages reservations, concurrency.
- History Service: Retrieves booking history.
-
Data Stores:
- PostgreSQL: Hotels, rooms, bookings (ACID for consistency).
- Redis: Availability cache, idempotency.
-
Flow:
- Search → Search Service → PostgreSQL/Redis.
- Reserve → Booking Service → PostgreSQL.
- Confirm → Booking Service → Stripe → PostgreSQL.
Why PostgreSQL?
PostgreSQL ensures ACID compliance, critical for preventing overbooking. Its row-level locking and MVCC handle concurrent reservations at 2,083 QPS effectively.
1. Concurrency (Prevent Overbooking)
- Problem: Ensure no room/room type is overbooked at 2,083 QPS peak.
-
Approaches & Tradeoffs:
-
Isolation Level: Serializable
- How: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT room_type.total_count - COUNT(bookings) ... INSERT;
- Pros: Prevents all anomalies, guarantees no overbooking.
- Cons: High contention, ~1K QPS max, frequent aborts.
-
Isolation Level: Read Committed
- How: BEGIN; SELECT total_count FROM room_types WHERE id = {room_type_id} FOR UPDATE; INSERT booking; COMMIT;
- Pros: Balances concurrency (~5ms), widely supported.
- Cons: Requires explicit locking, risks dirty reads.
-
Pessimistic Locking
- How: SELECT * FROM room_types WHERE id = {room_type_id} FOR UPDATE NOWAIT; then check count, insert.
- Pros: Immediate rejection, ACID-safe.
- Cons: Lock contention (~10ms), scales poorly at peak.
-
- OSSOptimistic Locking**
- How: Add version to room_types; UPDATE room_types SET available_count = available_count - 1 WHERE id = {room_type_id} AND version = {old_version};
- Pros: High concurrency (~2ms), no locks.
- Cons: Retries on conflict (5-10% at peak), complex logic.
- OSSOptimistic Locking**
-
Database Constraints
- How: CHECK (available_count >= 0) on room_types, decrement on insert.
- Pros: Simple, database-enforced.
- Cons: Write penalty (~20%), no retry logic.
-
Hybrid (Redis + PostgreSQL)
- How: Reserve in Redis (DECR room_type_id:available, TTL=10min), confirm in PostgreSQL with FOR UPDATE.
- Pros: Redis (<1ms) scales to 100K QPS, PostgreSQL ensures durability.
- Cons: Redis failure risks temporary overbooking (mitigated by PostgreSQL).
-
Isolation Level: Serializable
- Industry Example (Booking.com): Uses a hybrid approach—caching availability in memory (e.g., Redis) for speed, with a relational DB (e.g., PostgreSQL) for consistency.
- Optimal Solution: Hybrid—Redis for fast availability checks (2K QPS, <1ms), PostgreSQL with FOR UPDATE and CHECK constraint for final safety. Handles 2,083 QPS with <5ms latency.
- Tech Details: Redis: 100K ops/s, PostgreSQL: 10 nodes, 200 QPS/node.
2. Room Number vs. Room Type Assignment
-
Option 1: Assign Room Number During Booking
- How: SELECT room_id FROM rooms WHERE hotel_id = {hotel_id} AND room_type_id = {room_type_id} AND status = 'available' FOR UPDATE; UPDATE rooms SET status = 'booked'; INSERT booking;
- Pros: Precise allocation, guest knows room immediately.
- Cons: High contention (locks per room), complex inventory (e.g., 100 rooms = 100 rows locked), slower (~10ms/room).
- Concurrency Impact: At 2K QPS, locking individual rooms risks bottlenecks (e.g., 100 rooms/hotel = 20 QPS/hotel).
-
Option 2: Assign Room Type, Room Number at Check-In (Typical Hotels)
- How: SELECT total_count FROM room_types WHERE id = {room_type_id} FOR UPDATE; UPDATE available_count - 1; INSERT booking; Room assigned later via UPDATE booking SET room_id = {room_id};
- Pros: Simpler concurrency (locks per type, not room), faster (~5ms), flexible inventory management.
- Cons: Room assignment delayed, potential guest dissatisfaction.
- Concurrency Impact: Scales better—e.g., 10 room types/hotel = 200 QPS/type at peak.
- Chosen: Room Type—Reduces contention, aligns with industry (e.g., Marriott), manageable at 2K QPS. Room number assignment at check-in uses a separate low-QPS process.
- Tradeoffs: Room Type is optimal for scalability; Room Number suits luxury hotels but sacrifices throughput.
3. Scalability for Surges
- Problem: 2,083 QPS during holidays.
- Solution: Load balancers, PostgreSQL replicas, Kafka for async confirmations.
- Tech Details: Replicas: 2K QPS, Kafka: 10K msg/s.
4. Low-Latency Search
- Problem: <200ms at peak.
- Solution: Redis cache for availability, Elasticsearch for hotel search.
- Industry Example (Expedia): Caches room availability in memory.
- Tech Details: Redis: <1ms, Elasticsearch: 50ms.
5. High Availability
- Problem: 99.99% uptime.
- Solution: Multi-region PostgreSQL, SQS retries.
- Tech Details: Failover: 5s, SQS: 1K retries/s.
text
CollapseWrapCopy
+----------------+ +----------------+ | Client |<------->| API Gateway | | (Web/Mobile) | | (JWT, Routing) | +----------------+ +----------------+ | | | +----------------+ +-----+ +-----+-------+ | Search Service |<---+ | | | (Availability) | | | +----------------+ | | +----------------+ +-----+ +-----+-------+ | Booking Service |<--------| | | | | (Reservations) | | | | | +----------------+ | | | | +----------------+ | | +----------+ | | History Service |<--------| | | | | (Booking History)| | | | | +----------------+ +-----+ | | | | | +----------------+ +----------------+ +----------------+ | Redis |<------->| PostgreSQL | | Kafka | | (Cache, Avail.)| | (Rooms, Bookings)| | (Confirm Queue)| +----------------+ +----------------+ +----------------+ | +----------------+ +----------------+ | Elasticsearch |<------->| Stripe | | (Hotel Search) | | (Payments) | +----------------+ +----------------+
- Authentication: PostgreSQL + JWT + Redis – Booking.com’s secure login.
- Concurrency: PostgreSQL (ACID) + Redis – Expedia’s overbooking prevention.
- Scalability: Kafka + PostgreSQL Replicas – Marriott’s surge handling.
- Low Latency: Redis + Elasticsearch – Expedia’s fast search.
- Room Assignment: Room Type + PostgreSQL – Hilton’s flexible inventory.
- Availability: Multi-Region PostgreSQL – Airbnb’s reliable uptime.