Resiliency model - yurkka23/iMusic_team GitHub Wiki
Overview
CID diagram
RMA
Discover Phase: Identify Failures
ID | Interaction | Failure Short Name | Failure Description | Response |
---|---|---|---|---|
1 | Listener → Authentication Server | Authentication Timeout | The server fails to respond within the expected time due to high traffic. | Notify the user and retry the request automatically after a delay. Monitor server performance and scale up if necessary. |
2 | Listener → Recommendation API | Data Unavailable | Recommendation data is missing due to a failure in the Analytics Database. | Provide fallback recommendations or display a user-friendly error. Alert the database team to resolve the issue. |
3 | Streaming Server → Music Database | Query Failure | The database fails to return metadata for the requested track due to connection issues. | Retry the query. If the failure persists, display a "track unavailable" message. Log the incident for further investigation. |
Rate Phase: Analyze Failures
ID | Interaction | Impact | Likelihood | Time to Detect (TTD) | Time to Recover (TTR) | Risk (Impact × Likelihood) |
---|---|---|---|---|---|---|
1 | Listener → Authentication Server | High | Medium | < 5 minutes | 10 minutes | High |
2 | Listener → Recommendation API | Medium | High | 5–15 minutes | 20 minutes | Medium |
3 | Streaming Server → Music Database | High | Low | < 5 minutes | 15 minutes | Medium |
Act Phase: Mitigation Strategies
Scaling for High Traffic:
- Introduce auto-scaling for the Authentication Server during peak traffic.
- Implement caching for frequent authentication requests.
Fallback for Recommendations:
- Cache popular recommendations to serve users when the Analytics Database is unavailable.
- Regularly back up recommendation data.
Database Redundancy:
- Use database replication and load balancing for the Music Database.
- Monitor database health and configure automated failover.