AWS RedShift - keshavbaweja-git/guides GitHub Wiki

Workload management

Allows separation of different workload types. Goal is to prioritise important workloads/queries while throttling less important workloads. Allows to

Short queries are routed to Short query queue

Spiky workloads
Incremental automated snapshots are pushed down to S3
Transient secondary cluster is spun up
Detection of queuing up, spinning up transient cluster and termination of transient cluster is managed automatically by Redshift.
For every 24hrs of primary cluster usage, 1hr of free transient secondary cluster

Unload is better than returning all records through leader node.

COPY command
If RedShift cluster and Amazon DynamoDB table are not in the same region, specify REGION option on COPY command
COPY command loads data in parallel across cluster nodes
Data transfer incurred for COPY command counts towards table's provisioned data read throughput
READRATIO option on COPY command can be used to specify read throughput percentage of DynamoDB table to be used for data transfer
NOLOAD option on COPY command can be used to validate data without actually loading the data in cluster
COPY command format

copy <redshift_tablename> from 'dynamodb://<dynamodb_table_name>'
authorization
readratio '<integer>';

IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'

ACCESS_KEY_ID '<access-key-id>'
SECRET_ACCESS_KEY '<secret-access-key>';

COPY command applies automatic compression whenever an empty table with no compression encoding is specified.

Why is Vacuum operation not reclaiming disk space? If there are active transactions when delete is issued in another statement, rows are logically deleted but not physically removed to allow prior active transactions to access them. Once the active transactions have completed/rolled back, execute vacuum operation again to reclaim disk space.

Functionality to schedule queries is built into Redshift Query Editor
IAM user configuring schedule should be able to assume role with following policies
- AmazonEventBridgeFullAccess
- AmazonRedshiftDataFullAccess
Schedule can be specified in cron format
SNS notifications for query monitoring
Redshift data CLI can be used to download results for a scheduled execution

Account A
- Create an IAM role - roleA, with Redshift as Trusted entity
- Attach this role to Amazon Redshift cluster
- Attach inline policy to roleA to allow it to assume roleB in Account B
Account B
- Create a policy with list and read permissions for Glue and S3
- Create an IAM roleB with Trusted entity as another AWS account - Account A
- Attach policy to roleB
- With this roleB trusts all entities in Account A
- To limit the trust scope, modify trust relationship to limit to roleA in Account A