Ticket Booking - ashtishad/system-design GitHub Wiki
Functional Requirements:
- Users can search for events (e.g., by location, date, type).
- Users can view event details (e.g., seat maps, performers).
- Users can book tickets (reserve + confirm payment).
Non-Functional Requirements (Brief):
- Consistency: No double booking.
- Scalability: Handle surges (10% big events monthly).
- Low Latency: Search <200ms, booking <500ms.
-
Capacity Estimation (5 years):
- DAU: 10M users.
- Events: 10K/year * 5 = 50K events.
- Tickets: 50K * 10K tickets/event = 500M tickets.
- Storage: 50K events * 1KB + 500M tickets * 100B = 50MB + 50GB = ~50GB raw, ~500GB with indexes.
- QPS: Avg: 580 QPS; Peak (10% big events): 20,833 QPS.
- Event: {id, venue_id, performer_id, name, description, date, ticket_ids[]}
- Venue: {id, location, seat_map}
- Performer: {id, name}
- Ticket: {id, event_id, seat, price, status (available/reserved/booked), idempotency_key}
- GET /events/search?term={term}&location={loc}&date={date}: Returns event list.
- GET /events/{event_id}: Returns event details, tickets.
- POST /bookings/reserve: {event_id, ticket_id, idempotency_key}, reserves ticket.
- POST /bookings/confirm: {booking_id, payment_method}, confirms booking.
- Client: Web/mobile app.
- API Gateway: Routes, auth (JWT), rate-limiting.
-
Microservices:
- Search Service: Handles event discovery.
- Event Service: Manages event details, seat maps.
- Booking Service: Processes reservations/payments.
-
Data Stores:
- PostgreSQL: Events, tickets (ACID for consistency).
- Redis: Reservation locks, caching.
- Elasticsearch: Search index.
-
Flow:
- Search → Search Service → Elasticsearch.
- View → Event Service → PostgreSQL.
- Book → Booking Service → Redis → PostgreSQL → Stripe.
Why PostgreSQL?
PostgreSQL ensures ACID compliance for ticket bookings, preventing double bookings with row-level locking and MVCC, suitable for 20K QPS peaks with sharding.
1. Booking Tickets (Reserve + Confirm)
- Problem: Enable reliable ticket booking, preventing double bookings at 20,833 QPS peak.
-
Approaches & Tradeoffs:
-
Single-Step Booking
- How: POST /bookings {ticket_id, payment_method}; UPDATE tickets SET status='booked' WHERE id={ticket_id} AND status='available'; Stripe processes payment.
- Pros: Simple, one API call (~200ms).
- Cons: No reservation period, payment failures waste seats, high contention (~10ms/ticket).
- Use Case: Low-demand events.
-
Two-Step (Reserve + Confirm)
-
How:
- Reserve: POST /reserve {ticket_id}; UPDATE tickets SET status='reserved' WHERE id={ticket_id} AND status='available';
- Confirm: POST /confirm {booking_id}; Stripe payment, UPDATE tickets SET status='booked';
- Pros: Reservation period (e.g., 10min), better UX, reduces contention (~5ms/step).
- Cons: Requires expiration logic, complexity in timeouts.
- Use Case: Standard e-commerce (e.g., Ticketmaster).
-
How:
-
Optimistic Reservation
- How: Add version to tickets; UPDATE tickets SET status='reserved', version=version+1 WHERE id={ticket_id} AND status='available' AND version={old_version};
- Pros: High concurrency (~2ms), no locks.
- Cons: Retries on conflict (10-20% at peak), client retry logic.
- Use Case: High-read, low-conflict systems.
-
Pessimistic Locking
- How: SELECT * FROM tickets WHERE id={ticket_id} AND status='available' FOR UPDATE NOWAIT; then UPDATE status='reserved';
- Pros: Immediate rejection, ACID-safe.
- Cons: Lock contention (~10ms), scales poorly at 20K QPS.
- Use Case: Small-scale systems.
-
Hybrid (Redis + PostgreSQL)
-
How:
- Reserve: Set ticket_id in Redis (TTL=10min), then UPDATE tickets SET status='reserved' WHERE id={ticket_id} AND status='available';
- Confirm: Stripe payment, UPDATE tickets SET status='booked'; Redis key deleted.
- Pros: Redis (<1ms) scales to 100K QPS, PostgreSQL ensures durability.
- Cons: Redis failure risks temporary overbooking (mitigated by PostgreSQL), sync complexity.
- Use Case: High-throughput ticketing (e.g., Ticketmaster).
-
How:
-
Single-Step Booking
- Industry Example (Ticketmaster): Uses a two-step process with distributed locks (e.g., Redis) for reservations, ensuring seats are held for 10-15 minutes before payment confirmation.
- Optimal Solution: Hybrid Two-Step—Redis for fast reservation locks (20K QPS, <1ms), PostgreSQL with FOR UPDATE and UNIQUE (idempotency_key) for confirmation safety.
- Why Optimal: Balances speed (Redis: 100K ops/s), consistency (PostgreSQL: 10 shards, 2K QPS/shard), and UX (10min reservation).
- Tradeoffs: Adds Redis dependency (mitigated by PostgreSQL fallback), slight latency (5ms vs. 2ms for optimistic).
2. Viewing Event Details (Seat Maps)
- Problem: Display accurate, real-time seat maps for events at 20,833 QPS.
-
Approaches & Tradeoffs:
-
Static Fetch
- How: GET /events/{event_id}; SELECT * FROM tickets WHERE event_id={event_id}; returns all seats.
- Pros: Simple, one query (~50ms).
- Cons: Stale data after fetch, poor UX as seats sell out (~1s delay).
- Use Case: Low-demand events.
-
Polling
- How: Client polls every 5s; SELECT * FROM tickets WHERE event_id={event_id} AND status='available';
- Pros: Near-real-time (~5s lag), easy to implement.
- Cons: High QPS (20K * 12/min = 240K QPS), network overhead.
- Use Case: Small-scale systems.
-
Long Polling
- How: Client sends GET /events/{event_id}/updates, server holds request (30s), responds on change.
- Pros: Lower QPS (~20K/30s = 666 QPS), near-real-time (~1s).
- Cons: Server resource use, timeouts (~30s).
- Use Case: Medium-scale systems.
-
Server-Sent Events (SSE)
- How: Open SSE connection; server pushes ticket_id:status updates via Kafka events.
- Pros: Real-time (<1s), efficient (20K QPS sustainable).
- Cons: Connection overhead (~1M connections/server), complex infra.
- Use Case: High-traffic ticketing (e.g., Ticketmaster).
-
WebSockets
- How: Bidirectional connection; server pushes updates, client sends filters.
- Pros: Real-time (<1s), interactive, 20K QPS scalable.
- Cons: Higher resource use (~500K connections/server), complexity.
- Use Case: Luxury ticketing with personalization.
-
Static Fetch
- Industry Example (StubHub): Uses WebSockets/SSE for live seat map updates during high-demand sales.
- Optimal Solution: SSE—Server pushes updates via Kafka, client renders seat map in real-time.
- Why Optimal: Handles 20K QPS with <1s latency, scales with Kafka (10K msg/s), balances resource use.
- Tradeoffs: Adds Kafka/SSE infra (mitigated by load balancers), less flexible than WebSockets.
3. Searching for Events
- Problem: Fast, flexible event discovery at 20,833 QPS.
-
Approaches & Tradeoffs:
-
SQL Query
- How: SELECT * FROM events WHERE location LIKE '%{term}%' AND date={date};
- Pros: Simple, no extra infra (~100ms).
- Cons: Full-table scan (O(n)), slow at 50K events (~1s).
- Use Case: Tiny datasets.
-
Full-Text Search (PostgreSQL)
- How: SELECT * FROM events WHERE to_tsvector(name || description) @@ to_tsquery('{term}');
- Pros: Built-in, decent speed (~200ms).
- Cons: Limited geospatial support, scales to ~1K QPS.
- Use Case: Small-scale search.
-
Elasticsearch
- How: Index events with inverted index; GET /events/_search {query: {term, location, date}}.
- Pros: Fast (~50ms), geospatial support, scales to 20K QPS.
- Cons: Sync complexity, higher storage (~2x PostgreSQL).
- Use Case: Large-scale ticketing (e.g., Ticketmaster).
-
Geospatial (PostGIS)
- How: SELECT * FROM events WHERE ST_DWithin(location, ST_MakePoint({lon}, {lat}), {radius});
- Pros: Precise geospatial queries (~100ms), SQL-integrated.
- Cons: Slower than Elasticsearch, limited text search.
- Use Case: Location-focused apps.
-
Hybrid (Elasticsearch + Redis)
- How: Elasticsearch for search, Redis caches top queries (TTL=1min).
- Pros: Ultra-fast (<10ms with cache), 20K QPS sustainable.
- Cons: Cache invalidation, dual-system sync.
- Use Case: High-traffic, popular events.
-
SQL Query
- Industry Example (Eventbrite): Uses Elasticsearch for text/location search, caching frequent queries.
- Optimal Solution: Hybrid—Elasticsearch for search, Redis for caching top 1K queries.
- Why Optimal: Handles 20K QPS with <50ms (cache: <10ms), supports text/geospatial queries.
- Tradeoffs: Adds Redis/Elasticsearch infra (mitigated by CDC sync), cache staleness tolerable.
text
CollapseWrapCopy
+----------------+ +----------------+ | Client |<------->| API Gateway | | (Web/Mobile) | | (JWT, Routing) | +----------------+ +----------------+ | | | +----------------+ +-----+ +-----+-------+ | Search Service |<---+ | | | (Event Discovery)| | | +----------------+ | | +----------------+ +-----+ +-----+-------+ | Event Service |<--------| | | | | (Details, Seats)| | | | | +----------------+ | | | | +----------------+ | | +----------+ | | Booking Service|<--------| | | | | (Reservations) | | | | | +----------------+ +-----+ | | | | | +----------------+ +----------------+ +----------------+ | Redis | | PostgreSQL | | Elasticsearch | | (Locks, Cache) | | (Events, Tickets)| | (Search Index)| +----------------+ +----------------+ +----------------+ | | +----------------+ +-----+ +----------------+ | Kafka |<--------| | | Stripe | | (SSE Updates) | | | | (Payments) | +----------------+ +-----+----+----------------+
- Authentication: PostgreSQL + JWT – Ticketmaster’s secure access.
- Booking Tickets: Redis + PostgreSQL Two-Step – Ticketmaster’s reservation flow.
- Viewing Events: SSE + Kafka – StubHub’s real-time seat maps.
- Searching Events: Elasticsearch + Redis – Eventbrite’s fast discovery.
- Consistency: PostgreSQL (ACID) – Live Nation’s double-booking prevention.
- Scalability: Kafka + Redis – AXS’s surge handling.