12 RDS - gannurohith/devops-interview-wiki GitHub Wiki

📁 12 - RDS (Relational Database Service) (Basic to Intermediate Q&A)

  1. What is Amazon RDS? A managed relational database service that simplifies setup, operation, and scaling of databases in the cloud.

  2. Which database engines are supported by RDS? MySQL, PostgreSQL, Oracle, SQL Server, MariaDB, and Amazon Aurora.

  3. What are the key components of an RDS instance? DB instance, storage, DB engine, DB parameter group, security groups, subnet groups.

  4. What is Multi-AZ deployment in RDS? Provides high availability by automatically replicating data to a standby instance in another AZ.

  5. What is the difference between Multi-AZ and Read Replica?

    • Multi-AZ: automatic failover, not for read scaling.
    • Read Replica: for scaling read traffic and offloading analytics.
  6. How do backups work in RDS? Automated backups are stored in S3 and occur daily with transaction logs.

  7. How do you restore an RDS backup? You can restore to a point-in-time using automated backups or snapshots.

  8. What are RDS snapshots? Manual backups of the DB that you can restore at any time.

  9. What is RDS Performance Insights? A tool to monitor database performance metrics and query activity.

  10. How do you monitor RDS health and performance? Use CloudWatch metrics, Enhanced Monitoring, and Performance Insights.

  11. What is Enhanced Monitoring in RDS? Provides real-time OS metrics with a 1-second granularity.

  12. What are DB Parameter Groups? Configuration settings applied to a DB engine.

  13. What is the default storage type in RDS? General Purpose SSD (gp3); others include Provisioned IOPS and Magnetic (legacy).

  14. Can you encrypt an RDS instance? Yes, during creation using KMS keys. At-rest and in-transit encryption are supported.

  15. How do you scale an RDS database?

  • Vertical scaling: change instance class.
  • Horizontal scaling: use Read Replicas.
  1. What is the max retention period for automated backups? Up to 35 days.

  2. How do you restrict access to an RDS instance? Use security groups, subnet groups, and IAM policies.

  3. What is RDS Proxy? A fully managed database proxy that improves application scalability and availability.

  4. Can RDS be accessed publicly? Only if placed in a public subnet with public IP enabled (not recommended).

  5. What happens during a failover in Multi-AZ RDS? DNS is automatically updated to point to the standby DB within minutes.

  6. What are some best practices for RDS?

  • Enable Multi-AZ and automated backups
  • Use IAM authentication
  • Enable Enhanced Monitoring
  • Use TLS for encryption
  1. What is IAM Database Authentication in RDS? Lets users log in to MySQL/PostgreSQL databases using IAM credentials.

  2. How do you patch RDS? During the preferred maintenance window, AWS handles patching automatically.

  3. What is the difference between Aurora and RDS MySQL? Aurora is faster, highly available by default, and scales storage automatically.

  4. Can you migrate an existing DB to RDS? Yes, using AWS Database Migration Service (DMS) or native tools like mysqldump.

  5. How do you replicate data from on-prem to RDS? Use AWS DMS, VPN connection, and proper VPC routing.

  6. What happens if storage is full in RDS? If autoscaling is disabled, the DB becomes unresponsive.

  7. How do you troubleshoot performance issues in RDS? Use Performance Insights, slow query logs, and CloudWatch metrics.

  8. Can you stop an RDS instance? Yes, but only for non-Aurora DBs and for a maximum of 7 days.

  9. How is failover different in Aurora vs RDS? Aurora has faster failover due to shared storage and replicas.


📁 12 - RDS (Relational Database Service) (Scenario-Based Q&A for 5+ Years Experience)

  1. Scenario: Your application experiences increased read latency during peak traffic. What steps would you take to troubleshoot and resolve this issue in RDS?

    • Review CloudWatch metrics (ReadIOPS, CPUUtilization)
    • Analyze slow query logs and Performance Insights
    • Check if Read Replicas are being utilized efficiently
    • Consider scaling the instance or implementing caching
  2. Scenario: You need to ensure zero data loss and automatic failover for a mission-critical database. What RDS features will you enable?

    • Enable Multi-AZ deployment
    • Turn on automated backups and log backups
    • Ensure proper monitoring and alarms for failover
  3. Scenario: You receive an alert that your RDS instance has reached 95% storage utilization. How do you prevent downtime?

    • Enable storage auto-scaling
    • Manually increase allocated storage if needed
    • Investigate large tables or binary logs consuming space
  4. Scenario: A developer complains about intermittent 'Too many connections' errors. What do you check and how do you fix it?

    • Analyze DB parameter max_connections
    • Check application connection pooling
    • Use RDS Proxy to manage database connections more efficiently
  5. Scenario: Your RDS maintenance window is overlapping with peak traffic hours. What changes should you make?

    • Modify the preferred maintenance window to off-peak times
    • Use blue/green deployments (for Aurora) for zero-downtime patching
  6. Scenario: You want to audit who accessed the RDS database and when. What services or features do you enable?

    • Enable CloudTrail logging for API access
    • Turn on enhanced monitoring and DB logs
    • Use IAM authentication for granular tracking
  7. Scenario: Application queries are showing erratic performance. What are your steps to analyze and optimize?

    • Enable and check slow query log
    • Use Performance Insights to identify problematic queries
    • Check for missing indexes or table scans
  8. Scenario: You need to replicate your RDS MySQL database across regions. How do you do it?

    • Enable cross-region Read Replicas
    • Ensure VPC peering/security group access
    • Monitor replication lag
  9. Scenario: You are required to apply a major version upgrade to your production RDS DB. How do you proceed with minimal impact?

    • Clone or snapshot the DB in staging and test the upgrade
    • Use blue/green or multi-versioning strategies
    • Perform upgrade during low traffic with backups enabled
  10. Scenario: Your team wants to connect RDS to an on-prem app securely. How do you architect this?

  • Use AWS Direct Connect or VPN
  • Ensure RDS is in a private subnet
  • Configure proper routing and security group rules
  1. Scenario: Restore your RDS DB to a state it was in 12 hours ago due to a data corruption issue.
  • Use point-in-time recovery with automated backups
  • Restore to a new DB instance to avoid affecting live DB
  1. Scenario: What if your Multi-AZ standby is not failing over as expected?
  • Verify health checks and logs
  • Use describe-db-instances to inspect failover status
  • Open AWS Support case if needed
  1. Scenario: A read replica is experiencing replication lag. What can cause this and how do you fix it?
  • High load or slow queries
  • Network latency or IOPS bottleneck
  • Upgrade instance size or add more replicas
  1. Scenario: Cost is rising due to RDS. How do you optimize pricing for the current workload?
  • Use reserved instances
  • Enable storage autoscaling only when needed
  • Optimize queries and connections to reduce overprovisioning
  1. Scenario: You are migrating an on-prem MySQL DB to RDS with minimal downtime. What service do you use?
  • Use AWS Database Migration Service (DMS)
  • Set up continuous replication until final cutover
  1. Scenario: The application needs to connect securely without exposing credentials. What feature do you implement?
  • Use IAM Database Authentication with SSL encryption
  • Store credentials in AWS Secrets Manager
  1. Scenario: An RDS instance is publicly accessible by mistake. What’s your response plan?
  • Immediately disable public access flag
  • Adjust security group rules
  • Rotate all credentials
  1. Scenario: You want to periodically archive old data from your database. How do you automate this?
  • Use stored procedures or scheduled Lambda jobs
  • Offload to S3 or a data warehouse like Redshift
  1. Scenario: An app is experiencing timeout errors when writing to the DB. What do you check?
  • Check DB CPU, write IOPS, and lock contention
  • Review slow query logs
  • Tune schema and queries
  1. Scenario: A large schema change needs to be applied. How do you ensure safe deployment?
  • Clone the DB and test changes
  • Use tools like pt-online-schema-change
  • Apply changes in maintenance window
  1. Scenario: How do you monitor and alert for long-running queries?
  • Enable CloudWatch metrics and custom alerts
  • Use RDS Performance Insights or query monitoring tools
  1. Scenario: You need to isolate analytics workloads from transactional queries. What architecture would you use?
  • Direct analytics to Read Replicas
  • Use Aurora’s cluster endpoints to isolate traffic
  1. Scenario: You accidentally dropped a critical table. How do you recover quickly?
  • Use point-in-time restore to a new DB
  • Extract only the lost table from backup
  • Minimize downtime by running restore in parallel
  1. Scenario: You need to restrict which developers can perform destructive actions on RDS. How do you enforce this?
  • Use fine-grained IAM policies and roles
  • Enable CloudTrail auditing and monitor API calls
  1. Scenario: You want to test database failover. How do you simulate it safely?
  • Use AWS CLI to trigger failover manually
  • Monitor DNS switch and app impact during event
  1. Scenario: You need to audit DB performance over 6 months. What tool helps?
  • RDS Performance Insights with extended retention
  • CloudWatch Logs and Metrics stored in S3
  1. Scenario: You are tasked to implement disaster recovery in a new region. How do you plan RDS DR strategy?
  • Enable cross-region snapshot copy or Read Replica
  • Automate restore in DR region
  • Test periodically
  1. Scenario: You want to restrict access to RDS by IP. How do you do it?
  • Modify security group inbound rules for specific CIDRs
  1. Scenario: You need to enforce encryption for all data in transit and at rest. What features support this?
  • Enable TLS
  • Use KMS encryption for volumes and backups
  1. Scenario: Team wants self-service DB restores without admin access. How do you implement this safely?
  • Create Lambda-backed automation using IAM roles
  • Allow read-only access to snapshots or PIT restore via API