12 RDS - gannurohith/devops-interview-wiki GitHub Wiki
📁 12 - RDS (Relational Database Service) (Basic to Intermediate Q&A)
-
What is Amazon RDS? A managed relational database service that simplifies setup, operation, and scaling of databases in the cloud.
-
Which database engines are supported by RDS? MySQL, PostgreSQL, Oracle, SQL Server, MariaDB, and Amazon Aurora.
-
What are the key components of an RDS instance? DB instance, storage, DB engine, DB parameter group, security groups, subnet groups.
-
What is Multi-AZ deployment in RDS? Provides high availability by automatically replicating data to a standby instance in another AZ.
-
What is the difference between Multi-AZ and Read Replica?
- Multi-AZ: automatic failover, not for read scaling.
- Read Replica: for scaling read traffic and offloading analytics.
-
How do backups work in RDS? Automated backups are stored in S3 and occur daily with transaction logs.
-
How do you restore an RDS backup? You can restore to a point-in-time using automated backups or snapshots.
-
What are RDS snapshots? Manual backups of the DB that you can restore at any time.
-
What is RDS Performance Insights? A tool to monitor database performance metrics and query activity.
-
How do you monitor RDS health and performance? Use CloudWatch metrics, Enhanced Monitoring, and Performance Insights.
-
What is Enhanced Monitoring in RDS? Provides real-time OS metrics with a 1-second granularity.
-
What are DB Parameter Groups? Configuration settings applied to a DB engine.
-
What is the default storage type in RDS? General Purpose SSD (gp3); others include Provisioned IOPS and Magnetic (legacy).
-
Can you encrypt an RDS instance? Yes, during creation using KMS keys. At-rest and in-transit encryption are supported.
-
How do you scale an RDS database?
- Vertical scaling: change instance class.
- Horizontal scaling: use Read Replicas.
-
What is the max retention period for automated backups? Up to 35 days.
-
How do you restrict access to an RDS instance? Use security groups, subnet groups, and IAM policies.
-
What is RDS Proxy? A fully managed database proxy that improves application scalability and availability.
-
Can RDS be accessed publicly? Only if placed in a public subnet with public IP enabled (not recommended).
-
What happens during a failover in Multi-AZ RDS? DNS is automatically updated to point to the standby DB within minutes.
-
What are some best practices for RDS?
- Enable Multi-AZ and automated backups
- Use IAM authentication
- Enable Enhanced Monitoring
- Use TLS for encryption
-
What is IAM Database Authentication in RDS? Lets users log in to MySQL/PostgreSQL databases using IAM credentials.
-
How do you patch RDS? During the preferred maintenance window, AWS handles patching automatically.
-
What is the difference between Aurora and RDS MySQL? Aurora is faster, highly available by default, and scales storage automatically.
-
Can you migrate an existing DB to RDS? Yes, using AWS Database Migration Service (DMS) or native tools like mysqldump.
-
How do you replicate data from on-prem to RDS? Use AWS DMS, VPN connection, and proper VPC routing.
-
What happens if storage is full in RDS? If autoscaling is disabled, the DB becomes unresponsive.
-
How do you troubleshoot performance issues in RDS? Use Performance Insights, slow query logs, and CloudWatch metrics.
-
Can you stop an RDS instance? Yes, but only for non-Aurora DBs and for a maximum of 7 days.
-
How is failover different in Aurora vs RDS? Aurora has faster failover due to shared storage and replicas.
📁 12 - RDS (Relational Database Service) (Scenario-Based Q&A for 5+ Years Experience)
-
Scenario: Your application experiences increased read latency during peak traffic. What steps would you take to troubleshoot and resolve this issue in RDS?
- Review CloudWatch metrics (ReadIOPS, CPUUtilization)
- Analyze slow query logs and Performance Insights
- Check if Read Replicas are being utilized efficiently
- Consider scaling the instance or implementing caching
-
Scenario: You need to ensure zero data loss and automatic failover for a mission-critical database. What RDS features will you enable?
- Enable Multi-AZ deployment
- Turn on automated backups and log backups
- Ensure proper monitoring and alarms for failover
-
Scenario: You receive an alert that your RDS instance has reached 95% storage utilization. How do you prevent downtime?
- Enable storage auto-scaling
- Manually increase allocated storage if needed
- Investigate large tables or binary logs consuming space
-
Scenario: A developer complains about intermittent 'Too many connections' errors. What do you check and how do you fix it?
- Analyze DB parameter
max_connections
- Check application connection pooling
- Use RDS Proxy to manage database connections more efficiently
- Analyze DB parameter
-
Scenario: Your RDS maintenance window is overlapping with peak traffic hours. What changes should you make?
- Modify the preferred maintenance window to off-peak times
- Use blue/green deployments (for Aurora) for zero-downtime patching
-
Scenario: You want to audit who accessed the RDS database and when. What services or features do you enable?
- Enable CloudTrail logging for API access
- Turn on enhanced monitoring and DB logs
- Use IAM authentication for granular tracking
-
Scenario: Application queries are showing erratic performance. What are your steps to analyze and optimize?
- Enable and check slow query log
- Use Performance Insights to identify problematic queries
- Check for missing indexes or table scans
-
Scenario: You need to replicate your RDS MySQL database across regions. How do you do it?
- Enable cross-region Read Replicas
- Ensure VPC peering/security group access
- Monitor replication lag
-
Scenario: You are required to apply a major version upgrade to your production RDS DB. How do you proceed with minimal impact?
- Clone or snapshot the DB in staging and test the upgrade
- Use blue/green or multi-versioning strategies
- Perform upgrade during low traffic with backups enabled
-
Scenario: Your team wants to connect RDS to an on-prem app securely. How do you architect this?
- Use AWS Direct Connect or VPN
- Ensure RDS is in a private subnet
- Configure proper routing and security group rules
- Scenario: Restore your RDS DB to a state it was in 12 hours ago due to a data corruption issue.
- Use point-in-time recovery with automated backups
- Restore to a new DB instance to avoid affecting live DB
- Scenario: What if your Multi-AZ standby is not failing over as expected?
- Verify health checks and logs
- Use
describe-db-instances
to inspect failover status - Open AWS Support case if needed
- Scenario: A read replica is experiencing replication lag. What can cause this and how do you fix it?
- High load or slow queries
- Network latency or IOPS bottleneck
- Upgrade instance size or add more replicas
- Scenario: Cost is rising due to RDS. How do you optimize pricing for the current workload?
- Use reserved instances
- Enable storage autoscaling only when needed
- Optimize queries and connections to reduce overprovisioning
- Scenario: You are migrating an on-prem MySQL DB to RDS with minimal downtime. What service do you use?
- Use AWS Database Migration Service (DMS)
- Set up continuous replication until final cutover
- Scenario: The application needs to connect securely without exposing credentials. What feature do you implement?
- Use IAM Database Authentication with SSL encryption
- Store credentials in AWS Secrets Manager
- Scenario: An RDS instance is publicly accessible by mistake. What’s your response plan?
- Immediately disable public access flag
- Adjust security group rules
- Rotate all credentials
- Scenario: You want to periodically archive old data from your database. How do you automate this?
- Use stored procedures or scheduled Lambda jobs
- Offload to S3 or a data warehouse like Redshift
- Scenario: An app is experiencing timeout errors when writing to the DB. What do you check?
- Check DB CPU, write IOPS, and lock contention
- Review slow query logs
- Tune schema and queries
- Scenario: A large schema change needs to be applied. How do you ensure safe deployment?
- Clone the DB and test changes
- Use tools like pt-online-schema-change
- Apply changes in maintenance window
- Scenario: How do you monitor and alert for long-running queries?
- Enable CloudWatch metrics and custom alerts
- Use RDS Performance Insights or query monitoring tools
- Scenario: You need to isolate analytics workloads from transactional queries. What architecture would you use?
- Direct analytics to Read Replicas
- Use Aurora’s cluster endpoints to isolate traffic
- Scenario: You accidentally dropped a critical table. How do you recover quickly?
- Use point-in-time restore to a new DB
- Extract only the lost table from backup
- Minimize downtime by running restore in parallel
- Scenario: You need to restrict which developers can perform destructive actions on RDS. How do you enforce this?
- Use fine-grained IAM policies and roles
- Enable CloudTrail auditing and monitor API calls
- Scenario: You want to test database failover. How do you simulate it safely?
- Use AWS CLI to trigger failover manually
- Monitor DNS switch and app impact during event
- Scenario: You need to audit DB performance over 6 months. What tool helps?
- RDS Performance Insights with extended retention
- CloudWatch Logs and Metrics stored in S3
- Scenario: You are tasked to implement disaster recovery in a new region. How do you plan RDS DR strategy?
- Enable cross-region snapshot copy or Read Replica
- Automate restore in DR region
- Test periodically
- Scenario: You want to restrict access to RDS by IP. How do you do it?
- Modify security group inbound rules for specific CIDRs
- Scenario: You need to enforce encryption for all data in transit and at rest. What features support this?
- Enable TLS
- Use KMS encryption for volumes and backups
- Scenario: Team wants self-service DB restores without admin access. How do you implement this safely?
- Create Lambda-backed automation using IAM roles
- Allow read-only access to snapshots or PIT restore via API