AWS ‐ Database Services ‐ RDS | DocumentDB | DynamoDB | ElastiCache | Keyspaces | Neptune | Quantum Ledger | Aurora | Redshift | Timestream - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

image

Relational Database

read
  • Data is stored in a relationship structure
  • Data stored in tables as Rows and Columns (property information about the row)
  • Use SQL to access/query the data

Data Integrity

  • Completeness of the data (you can define constraints such as firstname, lastname canot be null, so the consumer of the data will have complete information)
  • Consistency - consumer can rely on the data
  • Accuracy - defining data model, relationship properly

Database Transactions

  • Collection of SQL statements processed in sequence
  • All or None functionality - If all these statement executions are succeed then the changes are applied to the database, if any one statement is fail then no changes are applied.
  • Db Transactions must be ACID
    • Atomic - entire execution of statements should be successful, not just part of it
    • Consistent - data written in db must adhere to all the rules and constraints defined
    • Isolation - transactions independent, doesnt rely on any other trnx for it to succeed
    • Durable - all the changes made to db are permanent

Amazon Relational Databases supported

  • Amazon Aurora
  • MariaDb
  • MS SQL Server
  • MySQL
  • Oracle
  • PostgreSQL

NoSQL (Non-Relational) Databases

read
  • To support varied data model (unstructured schema)

  • Used to store large amount of data - with less constraints

  • Supports Flexible data models

  • Provides Low latency - because less validations on the constraints and rules compared to relational database

  • Scalability & Performance - less processing and validations, efficient to compress and store data

  • Flexibility - stores different types of data

Types of NoSQL Database

  • Key-Value
  • DocumentDb
  • Graph db
  • Search db
  • In-Memory db

Comparison between sql and nosql database

image


Database Consistency Models

read

DynamoDB

  • It is a NoSQL Db, boasts performance and scalability, it is fully managed
  • It is serverless db, automatically scalable without worry about infra

image

image

  • It supports strong consistency model

    • all writes are completed before any read operation is performed, all latest data is guaranteed to be returned
    • data is always upto date
    • it is more resource intensive, hence lower performance
  • It also supports eventual consistency where you may not read the same data immediately after the write as there will be some delay

  • Default: Eventual Consistency

  • maximize performance of read operation

  • may not capture recent writes, i.e when eventual consistency enabled, subsequent read operation does not guarantee the data of write operation, but it will be eventually available for sure (after few seconds), not immediately. So the read operation may return stale data due to this delay in write/update/delete opeartion

  • Eventual consistency - is normal in scenarios where there is replication happen, multiple availability zones

  • Supports Strong Consistent Read

  • It can be deployed in single or multiple region (replication hapen accross region),

  • It supports availability zone


Relational Database Service (RDS)

read
  • It is a fully managed web based relational database service
  • It is cost efficient and scalable

Database Instances

  • Db Instance is a basic unit, instance is a deployment, a single unit/instance can host multiple databases

  • Instance identifier to be used in consumption

  • Limitations

    • max 40 db instances in an account depends on type of database
    • if you deploy sql server edition - max 10 instances
    • if oracle - max 10 instances, but if you bring your own license then max 40 instances can be deployed
    • if mysql,mariadb,postgresql - upto 40 instances
  • General purpose SSD - higher cost

  • Provisioned IOPS - higher cost

  • Magnetic storage - cost effective option

  • High Availability

    • Use multi-availability zone deployment
    • automatic failover - if db is down in one zone, automaticall another db spun up in another zone
  • Pay on-demand - pay as you use

  • Reserved - fixed price, time frame, hourly rate, partial upfront cost or everything upfront cost

  • Billing by database instances, multi-az database instances

Working with DB instance read replicas

A read replica is a read-only copy of a DB instance. You can reduce the load on your primary DB instance by routing queries from your applications to the read replica. In this way, you can elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads.

To create a read replica from a source DB instance, Amazon RDS uses the built-in replication features of the DB engine. For information about using read replicas with a specific engine, see the following sections:

Working with MariaDB read replicas

Working with read replicas for Microsoft SQL Server in Amazon RDS

Working with MySQL read replicas

Working with read replicas for Amazon RDS for Oracle

Working with read replicas for Amazon RDS for PostgreSQL

After you create a read replica from a source DB instance, the source becomes the primary DB instance. When you make updates to the primary DB instance, Amazon RDS copies them asynchronously to the read replica. The following diagram shows a source DB instance replicating to a read replica in a different Availability Zone (AZ). Clients have read/write access to the primary DB instance and read-only access to the replica.

image

Use cases for read replicas

  • Deploying one or more read replicas for a given source DB instance might make sense in a variety of scenarios, including the following:

  • Scaling beyond the compute or I/O capacity of a single DB instance for read-heavy database workloads. You can direct this excess read traffic to one or more read replicas.

  • Serving read traffic while the source DB instance is unavailable. In some cases, your source DB instance might not be able to take I/O requests, for example due to I/O suspension for backups or scheduled maintenance. In these cases, you can direct read traffic to your read replicas. For this use case, keep in mind that the data on the read replica might be "stale" because the source DB instance is unavailable.

  • Business reporting or data warehousing scenarios where you might want business reporting queries to run against a read replica, rather than your production DB instance.

  • Implementing disaster recovery. You can promote a read replica to a standalone instance as a disaster recovery solution if the primary DB instance fails.


Amazone DocumentDB

read image
  • It is a nosql db
  • Compatible with Mongodb
  • Automatic volume scaling, grow upto 64TB, upto 15 replicas of the database
  • it can be within VPCs
  • Health monitoring
  • Automated failover - restart and recover
  • Point-in-time cluster recover
  • supports KMS Encryption

Considerations

  • It supports Clusters (group of instances), supports upto 16 database instances
  • Primary used for writing, secondary will be used for reading
  • Invoiced by IO consumptions
  • Monitoring support

Interfaces

  • Use AWS Management Console
  • AWS CLI
  • Mongodb SHell, TOols, Drivers

Endpoints

  • To read and write the data - use Cluster Endpoints
  • To only read data - use Reader endpoint
  • To communicate with specific replica - use Instance endpoint

Common Usecases for Documentdb

  • User Profiles
  • Realtime big data
  • content management

ElastiCache for MemCached

read
  • In-Memory database

  • High performance, scalable and cost effective

  • Reduce complexity of distributed cache deployment

  • Failure detection and recovery

  • Automatic node discovery - no need to reconfigure

  • Flexible availability zone placement

Considerations

  • Speed and cost
  • Data and access patterns (ex: look up tables, static data)
  • Manage staleness of data

Components

  • Nodes - static allocation of caching memory, it has scaling capability
  • Cluster - group of nodes, same type
  • Regions and availability zones
  • Endpoints - to manage the configuration of clusters
  • Security - IAM policies, VPC, Security Groups, SUbnet groups
  • Event notifications

Interactions

  • AWS Management Console
  • AWS CLI
  • AWS SDK
  • ElastiCache API

Amazon Keyspaces (Casandra Database)

read
  • It is a managed apache cassandra database solution
  • It is a server less solution
  • Pay per use service
  • Unlimited throughput and storage solution

Reasons to use Keyspaces

  • For Low latency apps
  • Open source development
  • Move cassandra workloads to cloud with ease

CQL - Cassandra Query Language

  • Similar to SQL, can be used in CQL editor in aws management console

image


Amazon Neptune

read
  • It is a Managed Graph database
  • For complex application datasets
  • It keeps the relationship of the data
  • It uses Graph database engine - which processes billions of object relationships
  • It supports TinkerPop Gremlin and SPARQL query language support

Components

Database is broken down into clusters, data is stored atleast in 2 instances (1. Primary - read + write operation, 2. Replica - for read only operation)

  • Cluster - contains primary database instance, which allows user to read and write the data from the cluster
  • Cluster Provides High availability and reliability
  • Neptune Replica - upto 15 replicas per cluster
  • Cluster Volume - where the data is stored

image

image


Amazon Quantum Ledger Database (QLDB)

read
  • It is a ledger database or Journal database, Immutability
  • User can read and write the data in the database, difference is that once it is written, it cannot be changed - applicable in bank use case
  • Use case is For Change tracking purpose
  • It works like a blockchain db

image

Comparison QLDB vs Relational db

image

Concepts

  • Data object model
  • Journal first transactions - data is shown from journal

image


Amazon Aurora Database

read - It is a relational db - It offers High availability and Performance - It offers high performance clusters - 1 primary and multiple replica instances

image

image

Storage and Reliability

  • Cluster Volume Contents - the data is stored in single isolated volume, which is decoupled from the instances, the primary and the replicas are all referencing the same volume, hence if you add more and more data, you will need more and more storage
  • It handles automatic storage resizing - it supports upto 128 TBs
  • Billing is based on the storage
  • Aurora is automatically replicating the data into the replicas - this results high availability and reliability

Amazon RedShift

read - It is a managed data warehouse implementation, petabyte-scale - It makes use of RedShift Clusters - Clusters are collection of nodes, each node is one of 3 node types - RA3 Nodes - DC2 - DS2

image

image

image


Amazon Timestream

read - It is a managed timeseries database, collection data received over a time from single source device (temperature readings from thermostats) - Time is key index in the data records - It can store trillions of time series data points - It can scale quickly - It has extensive integration support to collect the time series data (ex: iot devices)

image

Architecture

image

Writes Architecture

  • stores data type - BIGINT, BOOLEAN, DOUBLE, VARCHAR

Storage Architecture

  • data is optimized and stored
  • data is retrieved using optimized queries causing reduced storage cost
  • supports retention policies

Query Architecture Model

  • Flat model - stores all data in table and uses timestamp column
  • Timeseries model - every value in the table is a key-value pair, key being the timestamp

Quiz

read

What element corresponds to a data entry in an Amazon Timestream database?

image

In a collection named “books,” what command would be used to retrieve all documents?

image

How many Microsoft SQL Server databases can be deployed via the Amazon Relational Database Service per AWS account?

image

How many replicas can an Amazon Aurora cluster include?

image

What command is required when adding clauses to Amazon Keyspaces read operations?

image

What element of a relational database corresponds to an individual data value?

image

What Amazon DynamoDB element corresponds to an individual data value?

image

What data node type is only recommended for legacy applications?

image

What consistency model ensures that data is guaranteed to be included in read operations immediately after a write operation? Strongly consistent

What database capability type does Amazon Neptune provide? Graph

What is the format type of a data object stored within Amazon DocumentDB? JSON File

What is an Amazon Quantum Ledger Database (QLDB) journal?

image

What database service does Amazon Keyspaces wrap? Apache Cassandra

What tool is used for programmatic access to Amazon ElastiCache for Memcached?

image

References

⚠️ **GitHub.com Fallback** ⚠️