Ticket Booking - ashtishad/system-design GitHub Wiki

1. Requirements

Functional Requirements:

  • Users can search for events (e.g., by location, date, type).
  • Users can view event details (e.g., seat maps, performers).
  • Users can book tickets (reserve + confirm payment).

Non-Functional Requirements (Brief):

  • Consistency: No double booking.
  • Scalability: Handle surges (10% big events monthly).
  • Low Latency: Search <200ms, booking <500ms.
  • Capacity Estimation (5 years):
    • DAU: 10M users.
    • Events: 10K/year * 5 = 50K events.
    • Tickets: 50K * 10K tickets/event = 500M tickets.
    • Storage: 50K events * 1KB + 500M tickets * 100B = 50MB + 50GB = ~50GB raw, ~500GB with indexes.
    • QPS: Avg: 580 QPS; Peak (10% big events): 20,833 QPS.

2. Core Entities

  • Event: {id, venue_id, performer_id, name, description, date, ticket_ids[]}
  • Venue: {id, location, seat_map}
  • Performer: {id, name}
  • Ticket: {id, event_id, seat, price, status (available/reserved/booked), idempotency_key}

3. APIs

  • GET /events/search?term={term}&location={loc}&date={date}: Returns event list.
  • GET /events/{event_id}: Returns event details, tickets.
  • POST /bookings/reserve: {event_id, ticket_id, idempotency_key}, reserves ticket.
  • POST /bookings/confirm: {booking_id, payment_method}, confirms booking.

4. High-Level Design

  • Client: Web/mobile app.
  • API Gateway: Routes, auth (JWT), rate-limiting.
  • Microservices:
    • Search Service: Handles event discovery.
    • Event Service: Manages event details, seat maps.
    • Booking Service: Processes reservations/payments.
  • Data Stores:
    • PostgreSQL: Events, tickets (ACID for consistency).
    • Redis: Reservation locks, caching.
    • Elasticsearch: Search index.
  • Flow:
    • Search → Search Service → Elasticsearch.
    • View → Event Service → PostgreSQL.
    • Book → Booking Service → Redis → PostgreSQL → Stripe.

Why PostgreSQL?

PostgreSQL ensures ACID compliance for ticket bookings, preventing double bookings with row-level locking and MVCC, suitable for 20K QPS peaks with sharding.


5. Deep Dives (Functional Focus)

1. Booking Tickets (Reserve + Confirm)

  • Problem: Enable reliable ticket booking, preventing double bookings at 20,833 QPS peak.
  • Approaches & Tradeoffs:
    • Single-Step Booking
      • How: POST /bookings {ticket_id, payment_method}; UPDATE tickets SET status='booked' WHERE id={ticket_id} AND status='available'; Stripe processes payment.
      • Pros: Simple, one API call (~200ms).
      • Cons: No reservation period, payment failures waste seats, high contention (~10ms/ticket).
      • Use Case: Low-demand events.
    • Two-Step (Reserve + Confirm)
      • How:
        • Reserve: POST /reserve {ticket_id}; UPDATE tickets SET status='reserved' WHERE id={ticket_id} AND status='available';
        • Confirm: POST /confirm {booking_id}; Stripe payment, UPDATE tickets SET status='booked';
      • Pros: Reservation period (e.g., 10min), better UX, reduces contention (~5ms/step).
      • Cons: Requires expiration logic, complexity in timeouts.
      • Use Case: Standard e-commerce (e.g., Ticketmaster).
    • Optimistic Reservation
      • How: Add version to tickets; UPDATE tickets SET status='reserved', version=version+1 WHERE id={ticket_id} AND status='available' AND version={old_version};
      • Pros: High concurrency (~2ms), no locks.
      • Cons: Retries on conflict (10-20% at peak), client retry logic.
      • Use Case: High-read, low-conflict systems.
    • Pessimistic Locking
      • How: SELECT * FROM tickets WHERE id={ticket_id} AND status='available' FOR UPDATE NOWAIT; then UPDATE status='reserved';
      • Pros: Immediate rejection, ACID-safe.
      • Cons: Lock contention (~10ms), scales poorly at 20K QPS.
      • Use Case: Small-scale systems.
    • Hybrid (Redis + PostgreSQL)
      • How:
        • Reserve: Set ticket_id in Redis (TTL=10min), then UPDATE tickets SET status='reserved' WHERE id={ticket_id} AND status='available';
        • Confirm: Stripe payment, UPDATE tickets SET status='booked'; Redis key deleted.
      • Pros: Redis (<1ms) scales to 100K QPS, PostgreSQL ensures durability.
      • Cons: Redis failure risks temporary overbooking (mitigated by PostgreSQL), sync complexity.
      • Use Case: High-throughput ticketing (e.g., Ticketmaster).
  • Industry Example (Ticketmaster): Uses a two-step process with distributed locks (e.g., Redis) for reservations, ensuring seats are held for 10-15 minutes before payment confirmation.
  • Optimal Solution: Hybrid Two-Step—Redis for fast reservation locks (20K QPS, <1ms), PostgreSQL with FOR UPDATE and UNIQUE (idempotency_key) for confirmation safety.
  • Why Optimal: Balances speed (Redis: 100K ops/s), consistency (PostgreSQL: 10 shards, 2K QPS/shard), and UX (10min reservation).
  • Tradeoffs: Adds Redis dependency (mitigated by PostgreSQL fallback), slight latency (5ms vs. 2ms for optimistic).

2. Viewing Event Details (Seat Maps)

  • Problem: Display accurate, real-time seat maps for events at 20,833 QPS.
  • Approaches & Tradeoffs:
    • Static Fetch
      • How: GET /events/{event_id}; SELECT * FROM tickets WHERE event_id={event_id}; returns all seats.
      • Pros: Simple, one query (~50ms).
      • Cons: Stale data after fetch, poor UX as seats sell out (~1s delay).
      • Use Case: Low-demand events.
    • Polling
      • How: Client polls every 5s; SELECT * FROM tickets WHERE event_id={event_id} AND status='available';
      • Pros: Near-real-time (~5s lag), easy to implement.
      • Cons: High QPS (20K * 12/min = 240K QPS), network overhead.
      • Use Case: Small-scale systems.
    • Long Polling
      • How: Client sends GET /events/{event_id}/updates, server holds request (30s), responds on change.
      • Pros: Lower QPS (~20K/30s = 666 QPS), near-real-time (~1s).
      • Cons: Server resource use, timeouts (~30s).
      • Use Case: Medium-scale systems.
    • Server-Sent Events (SSE)
      • How: Open SSE connection; server pushes ticket_id:status updates via Kafka events.
      • Pros: Real-time (<1s), efficient (20K QPS sustainable).
      • Cons: Connection overhead (~1M connections/server), complex infra.
      • Use Case: High-traffic ticketing (e.g., Ticketmaster).
    • WebSockets
      • How: Bidirectional connection; server pushes updates, client sends filters.
      • Pros: Real-time (<1s), interactive, 20K QPS scalable.
      • Cons: Higher resource use (~500K connections/server), complexity.
      • Use Case: Luxury ticketing with personalization.
  • Industry Example (StubHub): Uses WebSockets/SSE for live seat map updates during high-demand sales.
  • Optimal Solution: SSE—Server pushes updates via Kafka, client renders seat map in real-time.
  • Why Optimal: Handles 20K QPS with <1s latency, scales with Kafka (10K msg/s), balances resource use.
  • Tradeoffs: Adds Kafka/SSE infra (mitigated by load balancers), less flexible than WebSockets.

3. Searching for Events

  • Problem: Fast, flexible event discovery at 20,833 QPS.
  • Approaches & Tradeoffs:
    • SQL Query
      • How: SELECT * FROM events WHERE location LIKE '%{term}%' AND date={date};
      • Pros: Simple, no extra infra (~100ms).
      • Cons: Full-table scan (O(n)), slow at 50K events (~1s).
      • Use Case: Tiny datasets.
    • Full-Text Search (PostgreSQL)
      • How: SELECT * FROM events WHERE to_tsvector(name || description) @@ to_tsquery('{term}');
      • Pros: Built-in, decent speed (~200ms).
      • Cons: Limited geospatial support, scales to ~1K QPS.
      • Use Case: Small-scale search.
    • Elasticsearch
      • How: Index events with inverted index; GET /events/_search {query: {term, location, date}}.
      • Pros: Fast (~50ms), geospatial support, scales to 20K QPS.
      • Cons: Sync complexity, higher storage (~2x PostgreSQL).
      • Use Case: Large-scale ticketing (e.g., Ticketmaster).
    • Geospatial (PostGIS)
      • How: SELECT * FROM events WHERE ST_DWithin(location, ST_MakePoint({lon}, {lat}), {radius});
      • Pros: Precise geospatial queries (~100ms), SQL-integrated.
      • Cons: Slower than Elasticsearch, limited text search.
      • Use Case: Location-focused apps.
    • Hybrid (Elasticsearch + Redis)
      • How: Elasticsearch for search, Redis caches top queries (TTL=1min).
      • Pros: Ultra-fast (<10ms with cache), 20K QPS sustainable.
      • Cons: Cache invalidation, dual-system sync.
      • Use Case: High-traffic, popular events.
  • Industry Example (Eventbrite): Uses Elasticsearch for text/location search, caching frequent queries.
  • Optimal Solution: Hybrid—Elasticsearch for search, Redis for caching top 1K queries.
  • Why Optimal: Handles 20K QPS with <50ms (cache: <10ms), supports text/geospatial queries.
  • Tradeoffs: Adds Redis/Elasticsearch infra (mitigated by CDC sync), cache staleness tolerable.

System Design Diagram

text

CollapseWrapCopy

+----------------+ +----------------+ | Client |<------->| API Gateway | | (Web/Mobile) | | (JWT, Routing) | +----------------+ +----------------+ | | | +----------------+ +-----+ +-----+-------+ | Search Service |<---+ | | | (Event Discovery)| | | +----------------+ | | +----------------+ +-----+ +-----+-------+ | Event Service |<--------| | | | | (Details, Seats)| | | | | +----------------+ | | | | +----------------+ | | +----------+ | | Booking Service|<--------| | | | | (Reservations) | | | | | +----------------+ +-----+ | | | | | +----------------+ +----------------+ +----------------+ | Redis | | PostgreSQL | | Elasticsearch | | (Locks, Cache) | | (Events, Tickets)| | (Search Index)| +----------------+ +----------------+ +----------------+ | | +----------------+ +-----+ +----------------+ | Kafka |<--------| | | Stripe | | (SSE Updates) | | | | (Payments) | +----------------+ +-----+----+----------------+


Summary of Solutions and Industry Practices

  • Authentication: PostgreSQL + JWT – Ticketmaster’s secure access.
  • Booking Tickets: Redis + PostgreSQL Two-Step – Ticketmaster’s reservation flow.
  • Viewing Events: SSE + Kafka – StubHub’s real-time seat maps.
  • Searching Events: Elasticsearch + Redis – Eventbrite’s fast discovery.
  • Consistency: PostgreSQL (ACID) – Live Nation’s double-booking prevention.
  • Scalability: Kafka + Redis – AXS’s surge handling.
⚠️ **GitHub.com Fallback** ⚠️