202. AWS DynamoDB 01 - qyjohn/AWS_Tutorials GitHub Wiki
(1) Create a DynamoDB Table
Read the following AWS documentation to understand what DynamoDB is:
- What Is Amazon DynamoDB?
- Introduction to DynamoDB Concepts
- DynamoDB Core Components
- The DynamoDB API
- Naming Rules and Data Types
- Read Consistency
- Provisioned Throughput
- Partitions and Data Distribution
Then create, describe, update, and delete some test DynamoDB tables using the AWS web console and the AWS CLI. Also you will need to know how to PutItem, GetItem, Query, Scan and DeleteItem.
In the following examples, we use a DynamoDB table "training" with a hash key (hash, String) and a sort key (sort, Number) to demonstrate how to use the AWS CLI to perform certain operations.
- PutItem - Creates a new item, or replaces an old item with a new item. If an item that has the same primary key as the new item already exists in the specified table, the new item completely replaces the existing item.
When your application writes data to a DynamoDB table and receives an HTTP 200 response (OK), all copies of the data are updated. The data will eventually be consistent across all storage locations, usually within one second or less.
$ aws dynamodb put-item --table-name training --item '{"hash":{"S":"xxxx-xxxx-xxxx-xxxx"}, "sort":{"N":"12345678"}, "val":{"S":"1234567890"}}'
$ aws dynamodb put-item --table-name training --item '{"hash":{"S":"xxxx-xxxx-xxxx-xxxx"}, "sort":{"N":"22345678"}, "val":{"S":"2234567890"}}'
- GetItem - The GetItem operation returns a set of attributes for the item with the given primary key. If there is no matching item, GetItem does not return any data and there will be no Item element in the response.
Eventually Consistent Reads - When you read data from a DynamoDB table, the response might not reflect the results of a recently completed write operation. The response might include some stale data. If you repeat your read request after a short time, the response should return the latest data.
Strongly Consistent Reads - When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful. A strongly consistent read might not be available in the case of a network delay or outage.
$ aws dynamodb get-item --table-name training --key '{"hash":{"S":"xxxx-xxxx-xxxx-xxxx"}, "sort":{"N":"12345678"}}'
{
"Item": {
"sort": {
"N": "12345678"
},
"hash": {
"S": "xxxx-xxxx-xxxx-xxxx"
},
"val": {
"S": "1234567890"
}
}
}
- Query - A Query operation uses the primary key of a table or a secondary index to directly access items from that table or index.
In the following example, we handle Reserved Words in DynamoDB in our query. (You can not use any of these words as attribute names in expressions. If you need to write an expression containing an attribute name that conflicts with a DynamoDB reserved word, you can define an expression attribute name to use in the place of the reserved word. For more information, see Expression Attribute Names.)
$ aws dynamodb query --table-name training --key-condition-expression "#h = :v1" --expression-attribute-values '{":v1":{"S":"xxxx-xxxx-xxxx-xxxx"}}' --projection-expression "#h, sort, val" --expression-attribute-names '{"#h":"hash"}'
{
"Count": 2,
"Items": [
{
"sort": {
"N": "12345678"
},
"hash": {
"S": "xxxx-xxxx-xxxx-xxxx"
},
"val": {
"S": "1234567890"
}
},
{
"sort": {
"N": "22345678"
},
"hash": {
"S": "xxxx-xxxx-xxxx-xxxx"
},
"val": {
"S": "2234567890"
}
}
],
"ScannedCount": 2,
"ConsumedCapacity": null
}
- Scan - The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index.
By default, Scan uses eventually consistent reads when accessing the data in a table; therefore, the result set might not include the changes to data in the table immediately before the operation began. If you need a consistent copy of the data, as of the time that the Scan begins, you can set the ConsistentRead parameter to true.
$ aws dynamodb scan --table-name training --max-items 3
{
"Count": 4240,
"Items": [
{
"sort": {
"N": "10000711"
},
"hash": {
"S": "85aa69f5-f5d9-4ab9-9e91-00ae2639144a"
},
"val": {
"S": "0-85aa69f5-f5d9-4ab9-9e91-00ae2639144a-10000711"
}
},
{
"sort": {
"N": "10000760"
},
"hash": {
"S": "89501eb2-64d6-44a1-8c5e-a296afd7009d"
},
"val": {
"S": "0-89501eb2-64d6-44a1-8c5e-a296afd7009d-10000760"
}
},
{
"sort": {
"N": "10000020"
},
"hash": {
"S": "c8efedf0-c01e-42b7-818f-ff82d3ad379f"
},
"val": {
"S": "0-c8efedf0-c01e-42b7-818f-ff82d3ad379f-10000020"
}
}
],
"NextToken": "eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDN9",
"ScannedCount": 4240,
"ConsumedCapacity": null
}
$ aws dynamodb scan --table-name training --consistent-read --max-items 3
{
"Count": 4585,
"Items": [
{
"sort": {
"N": "10000711"
},
"hash": {
"S": "85aa69f5-f5d9-4ab9-9e91-00ae2639144a"
},
"val": {
"S": "0-85aa69f5-f5d9-4ab9-9e91-00ae2639144a-10000711"
}
},
{
"sort": {
"N": "10000760"
},
"hash": {
"S": "89501eb2-64d6-44a1-8c5e-a296afd7009d"
},
"val": {
"S": "0-89501eb2-64d6-44a1-8c5e-a296afd7009d-10000760"
}
},
{
"sort": {
"N": "10000020"
},
"hash": {
"S": "c8efedf0-c01e-42b7-818f-ff82d3ad379f"
},
"val": {
"S": "0-c8efedf0-c01e-42b7-818f-ff82d3ad379f-10000020"
}
}
],
"NextToken": "eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDN9",
"ScannedCount": 4585,
"ConsumedCapacity": null
}
- DeleteItem - Deletes a single item in a table by primary key.
$ aws dynamodb delete-item --table-name training --key '{"hash":{"S":"xxxx-xxxx-xxxx-xxxx"}, "sort":{"N":"12345678"}}'
(2) Writing to DynamoDB Table with Java
Now we create a new DynamoDB table with a hash key (hash, String), a sort key (sort, Number), and do a quick test with a simple Java application:
import java.io.*;
import java.util.*;
import com.amazonaws.*;
import com.amazonaws.auth.*;
import com.amazonaws.auth.profile.*;
import com.amazonaws.regions.*;
import com.amazonaws.services.dynamodbv2.*;
import com.amazonaws.services.dynamodbv2.model.*;
import com.amazonaws.services.dynamodbv2.document.*;
import com.amazonaws.services.dynamodbv2.datamodeling.*;
public class TestDDB extends Thread
{
public AmazonDynamoDBClient client;
public String tableName;
public TestDDB()
{
client = new AmazonDynamoDBClient();
client.configureRegion(Regions.AP_SOUTHEAST_2);
try
{
Properties prop = new Properties();
InputStream input = new FileInputStream("ddb.properties");
prop.load(input);
tableName = prop.getProperty("tableName");
}catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
public void put(String hash, int sort, String value)
{
HashMap<String, AttributeValue> item = new HashMap<String, AttributeValue>();
item.put("hash", new AttributeValue(hash));
item.put("sort", new AttributeValue().withN(Integer.toString(sort)));
item.put("val", new AttributeValue(value));
PutItemRequest putItemRequest = new PutItemRequest().withTableName(tableName).withItem(item);
try
{
client.putItem(putItemRequest);
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
public void run()
{
int start = 10000000;
while (true)
{
try
{
String hash = UUID.randomUUID().toString();
int sort = start;
String value = hash + "-" + sort;
put(hash, sort, value);
start++;
} catch (ConditionalCheckFailedException e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
}
public static void main(String[] args)
{
try
{
TestDDB test = new TestDDB();
test.start();
test.join();
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
}
Run this Java application for a couple of seconds, then you will be able to see items like the following in your DynamoDB table:
{
"hash": "bb11b860-06e0-4ed6-a207-6ec1af600da8",
"sort": 10000051,
"val": "bb11b860-06e0-4ed6-a207-6ec1af600da8-10000051"
}
Now we modify our test program to run in a multi-thread fashion, so that you can write to your DynamoDB table faster:
import java.io.*;
import java.util.*;
import com.amazonaws.*;
import com.amazonaws.auth.*;
import com.amazonaws.auth.profile.*;
import com.amazonaws.regions.*;
import com.amazonaws.services.dynamodbv2.*;
import com.amazonaws.services.dynamodbv2.model.*;
import com.amazonaws.services.dynamodbv2.document.*;
import com.amazonaws.services.dynamodbv2.datamodeling.*;
public class TestDDB extends Thread
{
public AmazonDynamoDBClient client;
public String tableName;
public int threadId;
public TestDDB()
{
client = new AmazonDynamoDBClient();
client.configureRegion(Regions.AP_SOUTHEAST_2);
try
{
Properties prop = new Properties();
InputStream input = new FileInputStream("ddb.properties");
prop.load(input);
tableName = prop.getProperty("tableName");
}catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
public void setThreadId(int id)
{
threadId = id;
}
public void put(String hash, int sort, String value)
{
HashMap<String, AttributeValue> item = new HashMap<String, AttributeValue>();
item.put("hash", new AttributeValue(hash));
item.put("sort", new AttributeValue().withN(Integer.toString(sort)));
item.put("val", new AttributeValue(value));
PutItemRequest putItemRequest = new PutItemRequest().withTableName(tableName).withItem(item);
try
{
client.putItem(putItemRequest);
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
public void run()
{
int start = 10000000;
while (true)
{
try
{
String hash = UUID.randomUUID().toString();
int sort = start;
String value = threadId + "-" + hash + "-" + sort;
put(hash, sort, value);
start++;
} catch (ConditionalCheckFailedException e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
}
public static void main(String[] args)
{
try
{
int threads = Integer.parseInt(args[0]);
TestDDB tests[] = new TestDDB [threads];
for (int i=0; i<threads; i++)
{
tests[i] = new TestDDB();
tests[i].setThreadId(i);
tests[i].start();
}
for (int j=0; j<threads; j++)
{
tests[j].join();
}
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
}
Run this code with 3 threads, you will see items like the following in your DynamoDB table:
{
"hash": "a3d32123-476d-4d23-a568-d2c9f1fffd99",
"sort": 10000230,
"val": "0-a3d32123-476d-4d23-a568-d2c9f1fffd99-10000230"
}
{
"hash": "4c9a95b9-0af6-4965-af87-9285790ce0f2",
"sort": 10000186,
"val": "1-4c9a95b9-0af6-4965-af87-9285790ce0f2-10000186"
}
{
"hash": "56874dc3-1fbc-489e-a224-1a2813a750b3",
"sort": 10000101,
"val": "2-56874dc3-1fbc-489e-a224-1a2813a750b3-10000101"
}
(3) Performance Considerations
Now you are capable of writing data to your DynamoDB table using the AWS CLI / Console and AWS SDK for Java, we will look into the performance of DynamoDB API calls.
It is a common practice to use a bash script to measure the time needed to perform an API call. For example:
#!/bin/bash
date +%s.%3N
aws dynamodb put-item --table-name training --item '{"hash":{"S":"xxxx-xxxx-xxxx-xxxx"}, "sort":{"N":"12345678"}, "val":{"S":"ABCDEFG"}}'
date +%s.%3N
When you run this script, you will see output similar to the following, indicating that it took approximately 400 ms to perform a PutItem API call. Then you would ask, why DynamoDB is so slow?
$ ./test.sh
1491199859.988
1491199860.361
To understand this behavior, you need to add the "--debug" option to your AWS CLI to see what happens when you make that API call. Below is the output from the same PutItem API call with debug option turned on. Let's see what information we can find from the debug output.
2017-04-03 06:15:12,403 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/1.11.55 Python/2.7.12 Linux/4.4.0-64-generic botocore/1.5.18
2017-04-03 06:15:12,403 - MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['dynamodb', 'put-item', '--table-name', 'training', '--item', '{"hash":{"S":"xxxx-xxxx-xxxx-xxxx"}, "sort":{"N":"12345678"}, "val":{"S":"ABCDEFG"}}', '--debug']
2017-04-03 06:15:12,403 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_scalar_parsers at 0x7f445c749230>
2017-04-03 06:15:12,403 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_assume_role_provider_cache at 0x7f445cc32c80>
2017-04-03 06:15:12,405 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /usr/local/lib/python2.7/dist-packages/botocore/data/dynamodb/2012-08-10/service-2.json
2017-04-03 06:15:12,419 - MainThread - botocore.hooks - DEBUG - Event service-data-loaded.dynamodb: calling handler <function register_retries_for_service at 0x7f445d07e488>
2017-04-03 06:15:12,419 - MainThread - botocore.handlers - DEBUG - Registering retry handlers for service: dynamodb
2017-04-03 06:15:12,420 - MainThread - botocore.hooks - DEBUG - Event building-command-table.dynamodb: calling handler <function add_waiters at 0x7f445c7545f0>
2017-04-03 06:15:12,423 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /usr/local/lib/python2.7/dist-packages/botocore/data/dynamodb/2012-08-10/waiters-2.json
2017-04-03 06:15:12,425 - MainThread - awscli.clidriver - DEBUG - OrderedDict([(u'table-name', <awscli.arguments.CLIArgument object at 0x7f445c2cce90>), (u'item', <awscli.arguments.CLIArgument object at 0x7f445c2cced0>), (u'expected', <awscli.arguments.CLIArgument object at 0x7f445c2ccf10>), (u'return-values', <awscli.arguments.CLIArgument object at 0x7f445c2ccf50>), (u'return-consumed-capacity', <awscli.arguments.CLIArgument object at 0x7f445c2ccf90>), (u'return-item-collection-metrics', <awscli.arguments.CLIArgument object at 0x7f445c2ccfd0>), (u'conditional-operator', <awscli.arguments.CLIArgument object at 0x7f445c436050>), (u'condition-expression', <awscli.arguments.CLIArgument object at 0x7f445c436090>), (u'expression-attribute-names', <awscli.arguments.CLIArgument object at 0x7f445c4360d0>), (u'expression-attribute-values', <awscli.arguments.CLIArgument object at 0x7f445c436110>)])
2017-04-03 06:15:12,425 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.put-item: calling handler <function add_streaming_output_arg at 0x7f445c749848>
2017-04-03 06:15:12,425 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.put-item: calling handler <function add_cli_input_json at 0x7f445cc36b90>
2017-04-03 06:15:12,426 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.put-item: calling handler <function unify_paging_params at 0x7f445c7d4230>
2017-04-03 06:15:12,428 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /usr/local/lib/python2.7/dist-packages/botocore/data/dynamodb/2012-08-10/paginators-1.json
2017-04-03 06:15:12,429 - MainThread - botocore.hooks - DEBUG - Event building-argument-table.dynamodb.put-item: calling handler <function add_generate_skeleton at 0x7f445c7c45f0>
2017-04-03 06:15:12,429 - MainThread - botocore.hooks - DEBUG - Event before-building-argument-table-parser.dynamodb.put-item: calling handler <bound method CliInputJSONArgument.override_required_args of <awscli.customizations.cliinputjson.CliInputJSONArgument object at 0x7f445c436150>>
2017-04-03 06:15:12,429 - MainThread - botocore.hooks - DEBUG - Event before-building-argument-table-parser.dynamodb.put-item: calling handler <bound method GenerateCliSkeletonArgument.override_required_args of <awscli.customizations.generatecliskeleton.GenerateCliSkeletonArgument object at 0x7f445c436290>>
2017-04-03 06:15:12,431 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.table-name: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,431 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.dynamodb.put-item: calling handler <awscli.argprocess.ParamShorthandParser object at 0x7f445cc107d0>
2017-04-03 06:15:12,431 - MainThread - awscli.arguments - DEBUG - Unpacked value of u'training' for parameter "table_name": u'training'
2017-04-03 06:15:12,431 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.item: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,431 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.dynamodb.put-item: calling handler <awscli.argprocess.ParamShorthandParser object at 0x7f445cc107d0>
2017-04-03 06:15:12,432 - MainThread - awscli.argprocess - DEBUG - Param item looks like JSON, not considered for param shorthand.
2017-04-03 06:15:12,432 - MainThread - awscli.arguments - DEBUG - Unpacked value of u'{"hash":{"S":"xxxx-xxxx-xxxx-xxxx"}, "sort":{"N":"12345678"}, "val":{"S":"ABCDEFG"}}' for parameter "item": OrderedDict([(u'hash', OrderedDict([(u'S', u'xxxx-xxxx-xxxx-xxxx')])), (u'sort', OrderedDict([(u'N', u'12345678')])), (u'val', OrderedDict([(u'S', u'ABCDEFG')]))])
2017-04-03 06:15:12,432 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.expected: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,432 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.return-values: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,432 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.return-consumed-capacity: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,433 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.return-item-collection-metrics: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,433 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.conditional-operator: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,433 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.condition-expression: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,433 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.expression-attribute-names: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,433 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.expression-attribute-values: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,433 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.cli-input-json: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,434 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.dynamodb.put-item.generate-cli-skeleton: calling handler <function uri_param at 0x7f445cc50aa0>
2017-04-03 06:15:12,434 - MainThread - botocore.hooks - DEBUG - Event calling-command.dynamodb.put-item: calling handler <bound method GenerateCliSkeletonArgument.generate_json_skeleton of <awscli.customizations.generatecliskeleton.GenerateCliSkeletonArgument object at 0x7f445c436290>>
2017-04-03 06:15:12,434 - MainThread - botocore.hooks - DEBUG - Event calling-command.dynamodb.put-item: calling handler <bound method CliInputJSONArgument.add_to_call_parameters of <awscli.customizations.cliinputjson.CliInputJSONArgument object at 0x7f445c436150>>
2017-04-03 06:15:12,434 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: env
2017-04-03 06:15:12,434 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: assume-role
2017-04-03 06:15:12,434 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: shared-credentials-file
2017-04-03 06:15:12,435 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: config-file
2017-04-03 06:15:12,435 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: ec2-credentials-file
2017-04-03 06:15:12,435 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: boto-config
2017-04-03 06:15:12,435 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: container-role
2017-04-03 06:15:12,435 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: iam-role
2017-04-03 06:15:12,438 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - INFO - Starting new HTTP connection (1): 169.254.169.254
2017-04-03 06:15:12,440 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "GET /latest/meta-data/iam/security-credentials/ HTTP/1.1" 200 10
2017-04-03 06:15:12,441 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - INFO - Starting new HTTP connection (1): 169.254.169.254
2017-04-03 06:15:12,442 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "GET /latest/meta-data/iam/security-credentials/Admin-Role HTTP/1.1" 200 906
2017-04-03 06:15:12,443 - MainThread - botocore.credentials - DEBUG - Found credentials from IAM Role: Admin-Role
2017-04-03 06:15:12,444 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /usr/local/lib/python2.7/dist-packages/botocore/data/endpoints.json
2017-04-03 06:15:12,462 - MainThread - botocore.client - DEBUG - Registering retry handlers for service: dynamodb
2017-04-03 06:15:12,463 - MainThread - botocore.hooks - DEBUG - Event creating-client-class.dynamodb: calling handler <function add_generate_presigned_url at 0x7f445d04e500>
2017-04-03 06:15:12,463 - MainThread - botocore.args - DEBUG - The s3 config key is not a dictionary type, ignoring its value of: None
2017-04-03 06:15:12,467 - MainThread - botocore.endpoint - DEBUG - Setting dynamodb timeout as (60, 60)
2017-04-03 06:15:12,468 - MainThread - botocore.hooks - DEBUG - Event before-parameter-build.dynamodb.PutItem: calling handler <function generate_idempotent_uuid at 0x7f445d07fe60>
2017-04-03 06:15:12,469 - MainThread - botocore.endpoint - DEBUG - Making request for OperationModel(name=PutItem) (verify_ssl=True) with params: {'body': '{"Item": {"sort": {"N": "12345678"}, "hash": {"S": "xxxx-xxxx-xxxx-xxxx"}, "val": {"S": "ABCDEFG"}}, "TableName": "training"}', 'url': u'https://dynamodb.ap-southeast-2.amazonaws.com/', 'headers': {'User-Agent': 'aws-cli/1.11.55 Python/2.7.12 Linux/4.4.0-64-generic botocore/1.5.18', 'Content-Type': u'application/x-amz-json-1.0', 'X-Amz-Target': u'DynamoDB_20120810.PutItem'}, 'context': {'client_region': 'ap-southeast-2', 'has_streaming_input': False, 'client_config': <botocore.config.Config object at 0x7f445c1b50d0>}, 'query_string': '', 'url_path': '/', 'method': u'POST'}
2017-04-03 06:15:12,469 - MainThread - botocore.hooks - DEBUG - Event request-created.dynamodb.PutItem: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f445c1e6f90>>
2017-04-03 06:15:12,470 - MainThread - botocore.auth - DEBUG - Calculating signature using v4 auth.
2017-04-03 06:15:12,470 - MainThread - botocore.auth - DEBUG - CanonicalRequest:
POST
/
content-type:application/x-amz-json-1.0
host:dynamodb.ap-southeast-2.amazonaws.com
x-amz-date:20170403T061512Z
x-amz-security-token:FQoDYXdzELf//////////wEaDAjYqBkc5xU4hQnx4SLBA4ICD3MPvGwVhaMPLefVwBsmzAiCkQMzOV9TEysh5UpGlc0Bt8FteCVQY65q0EHeE6R18tXYldoDQbs8DW7V+smaLX4SeNETDGLbcoyu9iP3QN6UsgnLaNk6SRDIR+xJqhNV7BN/j+wZBUMSzCcVr2KE0jLOxns8ighG/3RgnA01RxJvLbBkj3uKZbW4oO1AISjNFXeURbFatHXbCWOg5RLRZrH5mVP3V8KaRTA+B6VU3OBlSKCiSsnAkmEzDjzBqnJqhfOTTE5Zrd1i5hLH9uRybcCgYPSEgrWqqDdF2OtVevpi2Bza3URiLkHBxxR9YylwF2Ct/m/cL4dsNknGwEdktw56ld8b4EpqXrhrL/7Q5gVnRW9g72ZjdoXQbBjkTxqWFLDYzvPvakt6iwdyEQxQ3pbn++WI8/XW+ioVcJqRtWv7Y4/Nx8ml66sytLqy9Iy/TFXIzOt4wg3BWJYTO+ngxmW/YcYO3tehY2qRFT7R2+QjDwK9q84VkOIDbRceGTAFJnhWFGBtCyKQp0I2+BuPLB1YyicySBUTdAbvMaf8QbfZhgF3JKpdyoDT9PnBZqDMf7W8MJASAVW98J4Jc3W3KKLMh8cF
x-amz-target:DynamoDB_20120810.PutItem
content-type;host;x-amz-date;x-amz-security-token;x-amz-target
5f11e0c260253e2724d4f2cf055cd9725c2b4d9453d54b577fd3995132ded9ce
2017-04-03 06:15:12,471 - MainThread - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20170403T061512Z
20170403/ap-southeast-2/dynamodb/aws4_request
ecfde21b28eec10fae33c09ceaa3ae7e0c4fb558440b2cd6ae53ade802c3d123
2017-04-03 06:15:12,471 - MainThread - botocore.auth - DEBUG - Signature:
2fc122ace7b12f2c84260b73f5d4ffcf184e447a97ca48bac535e7ea2aed20b5
2017-04-03 06:15:12,472 - MainThread - botocore.endpoint - DEBUG - Sending http request: <PreparedRequest [POST]>
2017-04-03 06:15:12,472 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - INFO - Starting new HTTPS connection (1): dynamodb.ap-southeast-2.amazonaws.com
2017-04-03 06:15:12,517 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "POST / HTTP/1.1" 200 2
2017-04-03 06:15:12,517 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amzn-requestid': 'EN3PCNIESS9KU0K8569AAJAPCJVV4KQNSO5AEMVJF66Q9ASUAAJG', 'content-length': '2', 'server': 'Server', 'connection': 'keep-alive', 'x-amz-crc32': '2745614147', 'date': 'Mon, 03 Apr 2017 06:15:12 GMT', 'content-type': 'application/x-amz-json-1.0'}
2017-04-03 06:15:12,517 - MainThread - botocore.parsers - DEBUG - Response body:
{}
2017-04-03 06:15:12,518 - MainThread - botocore.hooks - DEBUG - Event needs-retry.dynamodb.PutItem: calling handler <botocore.retryhandler.RetryHandler object at 0x7f445c64bc10>
2017-04-03 06:15:12,518 - MainThread - botocore.retryhandler - DEBUG - No retry needed.
2017-04-03 06:15:12,518 - MainThread - awscli.formatter - DEBUG - RequestId: EN3PCNIESS9KU0K8569AAJAPCJVV4KQNSO5AEMVJF66Q9ASUAAJG
As you can see at the end of the debug output, the HTTPS Post was sent at 06:15:12,472 and the HTTPS response was received at 06:15:12,517, so the round trip latency was 45 ms.
You might have also noticed that between your first "date" command and your AWS CLI command, there was approximately 250 ms difference (WHY?). There was a certain difference between the AWS CLI command and the second "date" command too. As such, it is not a good pratice to use a bash script like what we have above to measure DynamoDB latencies.
1491200112.159
2017-04-03 06:15:12,403 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/1.11.55 Python/2.7.12 Linux/4.4.0-64-generic botocore/1.5.18
......
2017-04-03 06:15:12,518 - MainThread - awscli.formatter - DEBUG - RequestId: EN3PCNIESS9KU0K8569AAJAPCJVV4KQNSO5AEMVJF66Q9ASUAAJG
1491200112.545
A better way to measure round-trip latency is to enable logging for the AWS SDK. You should refer to our 101. AWS EC2 01 training if you do not know how to do this yet. Please use this option to observe the round-trip latency for your DynamoDB API calls. You might want to add a sleep(n) to your code to slow down your requests during this process. Compare the round-trip latency you observed with the server-side latency reported by CloudWatch metrics.
Below is an example of the logs we obtain from the AWS SDK for Java. As shown in this particular example, the round-trip latency for this particular API call is 6 ms.
2017-04-03 22:37:56,327 [Thread-0] DEBUG com.amazonaws.request - Sending Request: POST https://dynamodb.ap-southeast-2.amazonaws.com / Headers: (User-Agent: aws-sdk-java/1.11.98 Linux/4.4.0-64-generic Java_HotSpot(TM)_64-Bit_Server_VM/25.121-b13/1.8.0_121, amz-sdk-invocation-id: 335506f1-145e-bf0a-3020-ff79bb9e39a6, Content-Length: 171, X-Amz-Target: DynamoDB_20120810.PutItem, Content-Type: application/x-amz-json-1.0, )
2017-04-03 22:37:56,333 [Thread-0] DEBUG com.amazonaws.request - Received successful response: 200, AWS Request ID: O2VDQ3PEEDJEDJQ12PE7ME9B9VVV4KQNSO5AEMVJF66Q9ASUAAJG
2017-04-03 22:37:56,333 [Thread-0] DEBUG com.amazonaws.request - x-amzn-RequestId: O2VDQ3PEEDJEDJQ12PE7ME9B9VVV4KQNSO5AEMVJF66Q9ASUAAJG
It should be noted that the DynamoDB service does NOT have a service level agreement (SLA) on latency. DynamoDB is a massive scale distributed system, numerous components on the data path might contribute to the latency observed (both round-trip latency and server-side latency).
(4) Understand DynamoDB Partitions
DynamoDB stores data in partitions. A partition is an allocation of storage for a table, backed by solid-state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS region. Partition management is handled entirely by DynamoDB — customers never need to manage partitions themselves.
Read the following AWS documentation to understand the DynamoDB partitioning behavior, as well as how to design partition keys.
- Partitions and Data Distribution
- Guidelines for Working with Tables
- Choosing the Right DynamoDB Partition Key
It should be noted that DynamoDB does not provide a direct way to obtain the number of partitions in a table. However, if you enable DynamoDB Streams for a table, you will be able to count the number of open shards in the stream - each shard in the stream corresponds to a partition in the table. As such, you can use the Describe Stream API call to obtain the number of shards in the stream, which is the number of partitions in the table.
Create a DynamoDB table with one partition only. Load a large amount of data into the table to cause a partition split. Use the AWS CLI to observe the number of partitions in the table, and take notes on the time needed (and the amount of data) to cause a partition split. Increase the provisioned WCU and RCU to cause a further partition split. Then load a large amount of data to one of the partitions (HOW?) to see whether you are able to cause a further partition split for this particular partition. Use the AWS CLI to demonstrate that now you have an additional partition in the table.
Now you have created a DynamoDB table with multiple partitions, and one of the partitions contains significantly a lot more data than other partitions. The more active partition is called a "hot" partition in your table. It should be noted that with DynamoDB the provisioned WCU and RCU for the table are being "assigned" to each partition, with no "sharing" between partitions. For example, if you provision 1000 WCU to a table with 2 partitions, each partition gets 500 WCU. This means you can only consume 500 WCU from one particular partition in a second. If you attempt to write to a particular partition faster than the WCU "assigned" to that partition, your API call will be throttled.
(5) LSI and GSI
A local secondary index maintains an alternate sort key for a given partition key value. A local secondary index also contains a copy of some or all of the attributes from its base table; you specify which attributes are projected into the local secondary index when you create the table. The data in a local secondary index is organized by the same partition key as the base table, but with a different sort key. This lets you access data items efficiently across this different dimension. For greater query or scan flexibility, you can create up to five local secondary indexes per table.
A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table; it doesn't even need to have the same key schema as a table.
(6) Custom CloudWatch Metrics
When you have a large number of DynamoDB tables, it is quite difficult to keep track of your aggregated provisioned WCU/RCU over time. You can certainly do so using the "Stacked Data" graph in the CloudWatch dashboard, but that becomes very challenging when you have hundreds of tables.
The following Python code publishes custom CloudWatch metrics, using the AWS SDK for Python.
import boto3
wcu = 0
rcu = 0
ddb_client = boto3.client('dynamodb')
tables = ddb_client.list_tables();
for table in tables['TableNames']:
response = ddb_client.describe_table(TableName=table)
rcu = rcu + response['Table']['ProvisionedThroughput']['ReadCapacityUnits']
wcu = wcu + response['Table']['ProvisionedThroughput']['WriteCapacityUnits']
print rcu
print wcu
cw_client = boto3.client("cloudwatch")
cw_client.put_metric_data(
Namespace='DynamoDB_Aggregated',
MetricData=[
{
'MetricName': 'WCU',
'Value': wcu
},
{
'MetricName': 'RCU',
'Value': rcu
}
]
)
(7) Others
With a traditional relational database, you can dump the content of your database to a dump file as a back up (export), then import the content of your database from the dump file back to the database when needed (import). With DynamoDB, this is usually achieved using the AWS Data Pipeline service.
When your program sends a request, DynamoDB attempts to process it. If the request is successful, DynamoDB returns an HTTP success status code (200 OK), along with the results from the requested operation. The AWS SDKs take care of propagating errors to your application, so that you can take appropriate action. For example, in a Java program, you can write try-catch logic to handle a ResourceNotFoundException.
If the request is unsuccessful, DynamoDB returns an error. Each error has three components:
- An HTTP status code (such as 400).
- An exception name (such as ResourceNotFoundException).
- An error message (such as Requested resource not found: Table: tablename not found).
An HTTP 400 status code indicates a problem with your request, such as authentication failure, missing required parameters, or exceeding a table's provisioned throughput. You will have to fix the issue in your application before submitting the request again.
An HTTP 5xx status code indicates a problem that must be resolved by Amazon Web Services. This might be a transient error in which case you can retry your request until it succeeds.
From time to time you might want to clean up all the data in a DynamoDB table (like a TRUNCATE query in MySQL). Due to the design of DynamoDB, it would be much better if you first delete the table, then create the table again. Don't try to delete all the items in the table, this will take much longer.