System design QA - rs-hash/GETTHATJOB GitHub Wiki
1. Handling Millions of Requests per Second
Interviewer: “Your system needs to handle millions of requests per second. What’s your approach?”
You: I’ll start with a load balancer to evenly distribute traffic across multiple application servers and I will also use a Reverse Proxy like NGINX or AWS ALB for added routing intelligence.
2. Server Failures and High Availability
Interviewer: “What happens if a server goes down?”
You: I’d replicate data across multiple servers and set up a heartbeat mechanism to detect failures. Failed servers are replaced using auto-scaling in cloud services.
3. Scaling for Traffic Spikes
Interviewer: “What if traffic spikes overnight?”
You: Horizontal scaling, I’d add more servers dynamically behind the load balancer using auto-scaling groups or Kubernetes clusters.
I will also use caching layers like Redis or Memcached to reduce backend load.
4. Storing Large Volumes of Data
Interviewer: “How would you store terabytes or petabytes of data?”
You: I’d shard the database and distribute data across multiple nodes using techniques like consistent hashing to avoid data imbalance.
5. Ensuring Data Durability
Interviewer: “How do you ensure no data is lost?”
You: Replication, I’d keep multiple copies of data using a Primary-Replica setup or a Leaderless Replication model. For mission-critical systems, multi-region replication ensures disaster recovery.
6. Dealing with Write Performance Issues
Interviewer: “Won’t replication slow down writes?”
You: It depends on whether we prioritize strong consistency or eventual consistency.
- For strong consistency, wait until writes propagate to all replicas before returning success.
- For high write throughput, go with eventual consistency like DynamoDB or Cassandra.
7. Conflict Resolution in Distributed Systems
Interviewer: “How do you handle conflicting writes in distributed databases?”
You: Use techniques like vector clocks or timestamps to track versions of data. Conflicts can be resolved during reads using application logic.
8. Write-Heavy Use Cases
Interviewer: “When would you use leaderless replication?”
You: Leaderless systems like Cassandra are great for high-write use cases, where speed matters more than consistency—e.g., logging systems or IoT data collection.
9. Quorums for Read/Write Operations
Interviewer: “What’s a quorum, and when would you use it?”
You: A quorum is the minimum number of nodes that must confirm an operation for it to succeed. For example:
- Write quorum ensures data is stored safely.
- Read quorum ensures the most recent data is retrieved.
10. Real-World Example
Interviewer: “Can you give an example where this design is applied?”
You: Imagine a video streaming platform like YouTube:
- When a user uploads a video, it’s stored in a distributed file system.
- A pub-sub pattern (e.g., Kafka) triggers tasks like video processing for different resolutions (720p, 1080p).
- Replication ensures videos are available in multiple regions for faster playback.