114. AWS S3 01 - qyjohn/AWS_Tutorials GitHub Wiki
This section covers the basic usage of S3. You will interact with S3 using the AWS console, AWS CLI, and the AWS SDK for PHP. You will build a simple web application allowing users to upload files to your S3 bucket. You will use the ELB to provide load balancing for multiple EC2 instances running the same web application.
(1) S3 Basics
S3 is a massive scale object storage service. In short, S3 stores data as "objects" within resources called "buckets". You can store as many objects as you want within a bucket, and write, read, and delete objects in your bucket. The objects can be up to 5 terabytes in size.
We get started by using the AWS console to create an S3 bucket, then upload a couple of files to your S3 bucket. For the time being, let's not worry about other settings such as permissions, etc. When creating your S3 bucket, you should pay attention to the following restrictions and limitations.
Then we use the AWS CLI to work with your S3 bucket. Here we create a new S3 bucket with the name being "331982-training". It should be noted that S3 bucket names are globally unique, so you will need to pick a different name for your exercise. Also, from time to time you should add the "--debug" option to your AWS CLI commands to observe what happens behind the scene.
$ aws s3 mb s3://331982-training/
make_bucket: 331982-training
$ aws s3 cp index.html s3://331982-training/
upload: ./index.html to s3://331982-training/index.html
$ aws s3 ls s3://331982-training/
2017-03-15 13:52:19 42 index.html
$ aws s3 cp s3://331982-training/index.html index.0000
download: s3://331982-training/index.html to ./index.0000
$ ls
index.html index.0000
$ aws s3 rm s3://331982-training/index.html
delete: s3://331982-training/index.html
$ aws s3 ls s3://331982-training/
We can use the AWS console and the AWS CLI to easily create bucket, copy files (objects) between your local file system and S3. This is quite convenient when you have only a handful of files (objects) to process. What if you have thousands (or more) of files to handle?
We use the following command to create a test folder "many_files" with 2000 test files.
$ mkdir many_files
$ cp index.html many_files/
$ for f in {0000..1999}; do cp index.html test.$f; done
$ ls
You can certainly use a for loop in your bash command to achieve this. However, remember that we have another file index.html in the same folder. The for loop approach will obviously miss this particular file.
$ for f in {0000..1999}; do aws s3 cp test.$f s3://331982-training/folder001/; done
upload: ./test.0000 to s3://331982-training/folder001/test.0000
upload: ./test.0001 to s3://331982-training/folder001/test.0001
upload: ./test.0002 to s3://331982-training/folder001/test.0002
upload: ./test.0003 to s3://331982-training/folder001/test.0003
upload: ./test.0004 to s3://331982-training/folder001/test.0004
upload: ./test.0005 to s3://331982-training/folder001/test.0005
upload: ./test.0006 to s3://331982-training/folder001/test.0006
......
$ aws s3 ls s3://331982-training/folder001/
The AWS CLI provides a sync command. You can use the sync command to recursively copy new and updated files from the source directory to the destination. It should be noted that this command only creates folders in the destination if they contain one or more files.
$ cd ..
$ aws s3 sync many_files s3://331982-training/folder002/
(2) Simple Bash Scripting
You should compare the time difference between the above-mentioned two approaches, and try to understand the time difference observed using the "--debug" option. This can be achieved by a simple bash script like this:
#
# test1.sh
# Copy files to S3 one by one
#
date
for f in {0000..1999}
do
aws s3 cp many_files/test.$f s3://331982-training/folder00/
done
date
#
# test2.sh
# Copy files to S3 using sync
#
date
aws s3 sync many_files s3://331982-training/folder004/
date
Save these bash scripts as test1.sh and test2.sh and run them. The difference between two time stamps is the time needed to upload all 2000 files to S3.
$ chmod +x test1.sh
$ chmod +x test2.sh
$ ./test1.sh
Fri Mar 17 08:10:18 AEDT 2017
upload: many_files/test.0000 to s3://331982-training/folder003/test.0000
upload: many_files/test.0001 to s3://331982-training/folder003/test.0001
......
upload: many_files/test.1998 to s3://331982-training/folder003/test.1998
upload: many_files/test.1999 to s3://331982-training/folder003/test.1999
Fri Mar 17 08:15:28 AEDT 2017
$ ./test2.sh
Fri Mar 17 08:58:10 AEDT 2017
upload: many_files/test.0000 to s3://331982-training/folder004/test.0000
upload: many_files/test.0002 to s3://331982-training/folder004/test.0002
upload: many_files/test.0004 to s3://331982-training/folder004/test.0004
upload: many_files/test.0001 to s3://331982-training/folder004/test.0001
upload: many_files/test.0003 to s3://331982-training/folder004/test.0003
upload: many_files/test.0015 to s3://331982-training/folder004/test.0015
upload: many_files/test.0009 to s3://331982-training/folder004/test.0009
upload: many_files/test.0005 to s3://331982-training/folder004/test.0005
upload: many_files/index.html to s3://331982-training/folder004/index.html
upload: many_files/test.0010 to s3://331982-training/folder004/test.0010
upload: many_files/test.0016 to s3://331982-training/folder004/test.0016
......
upload: many_files/test.0098 to s3://331982-training/folder004/test.0098
upload: many_files/test.0072 to s3://331982-training/folder004/test.0072
Fri Mar 17 08:58:29 AEDT 2017
By now you have learned how to write a simple bash script and make it executable (using "chmod +x"). You have also learned how to write a for loop in bash script. This is very useful when you need to do many similar things in one go. You should also know how to pass variables (parameters) to your bash script, like this:
#
# test3.sh
# Simple bash script with parameters
# $1 is your first parameter and $2 is your second parameter
#
for i in `seq $1..$2`
do
echo $i
done
$ chmod +x test3.sh
$ ./test3.sh 5 15
5
6
7
8
9
10
11
12
13
14
15
As you can see, bash scripts can be very useful to automatic some of your daily routines, especially testings. We recommend that you learn some scripting skills. The following tutorials are very good places to get started.
(3) Working with S3 using Java
By now you should know where to find the Java API docs when you need to use Java to interact with a particular AWS service. Once you have access to the Java API docs, writing a Java application to interact with AWS is actually quite easy - all you need is to identify the right API call (method) to use, provide the expected input parameter to the API call, and parse the returned results according to the documentation.
We will start by s simple application to list all the buckets in your AWS account, and then list the objects in a particular S3 buckets.
import java.util.*;
import com.amazonaws.*;
import com.amazonaws.auth.*;
import com.amazonaws.auth.profile.*;
import com.amazonaws.regions.*;
import com.amazonaws.services.s3.*;
import com.amazonaws.services.s3.model.*;
public class TestS3
{
public AmazonS3Client client;
public TestS3()
{
client = new AmazonS3Client();
client.configureRegion(Regions.AP_SOUTHEAST_2);
}
public void listBuckets()
{
try
{
List<Bucket> buckets = client.listBuckets();
for (Bucket bucket : buckets)
{
System.out.println(bucket.getName());
}
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
public void listObjects(String bucket)
{
try
{
ObjectListing listing = client.listObjects(bucket);
List<S3ObjectSummary> objects = listing.getObjectSummaries();
for (S3ObjectSummary object : objects)
{
System.out.println(object.getKey());
}
if (listing.isTruncated())
{
System.out.println("This is a truncated list.");
}
else
{
System.out.println("This is a complete list.");
}
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
public static void main(String[] args)
{
TestS3 test = new TestS3();
test.listBuckets();
test.listObjects(args[0]);
}
}
This Java application expects one input parameter, which is the S3 bucket name. This is denoted as args[0] in our code. If you run a Java application with this command line "java MyProgram AAAA BBBB CCCC" then the value for args[0] is "AAAA", the value for args[1] is "BBBB", and the value for args[2] is "CCCC".
Now we compile and run this Java application. In the example below, I use my own S3 bucket "331982-training" as the input parameter. You should use your own S3 bucket name for your tests. If you have more than 1000 objects in the test bucket, you will see "This is a truncated list." in the output. This is because the listObjects() method only returns up to 1000 objects, as described in the following AWS documentation:
$ javac TestS3.java
$ java TestS3 331982-training
......
This is a truncated list.
Now we modify the listObjects() method in our code to print out a full list of the objects in the bucket. If you have log4j enables (see AWS EC2 101 if you do not know how to do this), you will see in the output that multiple API calls are made to retrieve the full object list.
public void listObjects(String bucket)
{
try
{
ObjectListing listing = client.listObjects(bucket);
List<S3ObjectSummary> objects = listing.getObjectSummaries();
for (S3ObjectSummary object : objects)
{
System.out.println(object.getKey());
}
while (listing.isTruncated())
{
ListNextBatchOfObjectsRequest request = new ListNextBatchOfObjectsRequest(listing);
listing = client.listNextBatchOfObjects(request);
objects = listing.getObjectSummaries();
for (S3ObjectSummary object : objects)
{
System.out.println(object.getKey());
}
}
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
Now we add a method to write to S3. For example, I want to upload a file on my local file system to a folder in my S3 bucket, and I want to preserve the original file name. Below is the putFileToS3() method to achieve this. You will need to modify the main() method for your tests.
public void putFileToS3(String filename, String bucket, String folder)
{
try
{
File file = new File(filename);
String key = folder + "/" + file.getName();
client.putObject(bucket, key, file);
ObjectListing listing = client.listObjects(bucket, folder);
List<S3ObjectSummary> objects = listing.getObjectSummaries();
for (S3ObjectSummary object : objects)
{
System.out.println(object.getKey());
}
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
Now we will see how we can create a very large number of objects in your S3 bucket. In this example, we randomly generate a UUID and use the UUID as the key for the S3 object. The content of the S3 object includes two lines, the first line is the sequence number, the second line is the UUID. Again, you will need to modify the main() method for your tests.
public void putManyObjectsToS3(String bucket, String folder, int count)
{
try
{
for (int i=0; i<count; i++)
{
String uuid = UUID.randomUUID().toString();
String key = folder + "/" + uuid;
String content = i + "\n" + uuid;
client.putObject(bucket, key, content);
}
} catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
(4) Working with S3 using PHP
Now we will learn some simple PHP. It is OK if you have never worked with PHP before. We will learn that slowly. First of all, let's install Apache and make sure that it works. After executing the following commands, use your browser to check http://ec2-ip-address/ to see if you see the Apache welcome page.
$ sudo apt-get update
$ sudo apt-get install apache2
$ sudo service apache2 start
Then we install PHP, along with some commonly used PHP extensions such as the MySQL connector.
$ sudo apt-get install php libapache2-mod-php php-mcrypt php-mysql php-curl php-xml
$ sudo service apache2 restart
By default, the Apache web server serves content from the /var/www/html folder. To make our life easier, we will change the ownership of this particular folder to the ubuntu user.
$ cd /var/www
$ sudo chown -R ubuntu:ubuntu html
$ cd html
$ ls
index.html
Under the /var/www/html folder, create a file test.php, with the following content. Use your browser to visit http://ec2-ip-address/test.php to make sure that PHP works.
<?php
phpinfo();
?>
Now we download the AWS SDK for PHP to /var/www/html folder. We recommend that you take a quick look at the following AWS SDK for PHP site first.
$ cd /var/www/html
$ wget http://docs.aws.amazon.com/aws-sdk-php/v3/download/aws.phar
Then we create a simple PHP page to display all the S3 buckets in a particular AWS region. Under the /var/www/html folder, create a file test2.php, with the following content. Use your browser to visit http://ec2-ip-address/test2.php, you will see a messy block of information, which contains the names of your S3 buckets, along with many other information.
<?php
require 'aws.phar';
$s3 = new Aws\S3\S3Client(['version' => 'latest','region' => 'us-east-1']);
$result = $s3->listBuckets();
print($result);
?>
Such messy output is of course not pretty. We add some HTML format to the output to make it look better. To understand the data structure in the response, you will need to refer to the API Reference of the AWS SDK for PHP.
<?php
require 'aws.phar';
$s3 = new Aws\S3\S3Client(['version' => 'latest','region' => 'us-east-1']);
$result = $s3->listBuckets();
foreach ($result['Buckets'] as $bucket)
{
echo $bucket['Name'] . "<br>";
}
?>
Similarly, you can list the Objects in an S3 bucket using the following PHP code. Please note that in the following code you need to use the correct AWS region for your S3 bucket.
<?php
require 'aws.phar';
$s3 = new Aws\S3\S3Client(['version' => 'latest','region' => 'us-east-1']);
$bucket = '331982-training';
$result = $s3->listObjects(array('Bucket' => $bucket));
foreach ($result['Contents'] as $object)
{
echo $object['Key'] . "<br>";
}
?>
If you happen to use the wrong AWS region in the code, you will see something similar to the following in your /var/log/apache2/error.log. We recommend that you reproduce this behavior by intentionally put the wrong information into your PHP code and see what happens.
[Fri Mar 17 04:29:50.315476 2017] [:error] [pid 20783] [client 54.240.193.1:25905] PHP Fatal error: Uncaught Aws\\S3\\Exception\\PermanentRedirectException: Encountered a permanent red
irect while requesting https://s3.amazonaws.com/331982-training?encoding-type=url. Are you sure you are using the correct region for this bucket? in phar:///var/www/html/aws.phar/Aws/S3
/PermanentRedirectMiddleware.php:49\nStack trace:\n#0 phar:///var/www/html/aws.phar/GuzzleHttp/Promise/Promise.php(203): Aws\\S3\\PermanentRedirectMiddleware->Aws\\S3\\{closure}(Object(
Aws\\Result))\n#1 phar:///var/www/html/aws.phar/GuzzleHttp/Promise/Promise.php(156): GuzzleHttp\\Promise\\Promise::callHandler(1, Object(Aws\\Result), Array)\n#2 phar:///var/www/html/aw
s.phar/GuzzleHttp/Promise/TaskQueue.php(47): GuzzleHttp\\Promise\\Promise::GuzzleHttp\\Promise\\{closure}()\n#3 phar:///var/www/html/aws.phar/GuzzleHttp/Handler/CurlMultiHandler.php(96)
: GuzzleHttp\\Promise\\TaskQueue->run()\n#4 phar:///var/www/html/aws.phar/GuzzleHttp/Handler/CurlMultiHandler.php(123): GuzzleHttp\\Handler\\CurlMultiHandler->tick()\n#5 phar:///var/www
/html/aws.phar/GuzzleHttp/Promise/Promise in phar:///var/www/html/aws.phar/Aws/S3/PermanentRedirectMiddleware.php on line 49
Now we create a simple web page (upload.php) to upload a file to your S3 bucket. As shown in the following example, you can mix HTML code and PHP code in one PHP file. The only thing you need to do is to use "" to enclose your PHP code. Save this code as upload.php and test it out. After you upload a file, check into your S3 bucket to see it it is there. If it is not there, check into /var/log/apache2/error.log to see what the issue is.
<?php
$server = $_SERVER['SERVER_ADDR'];
session_start();
$session_id = session_id();
if (!isset($_SESSION['marker']))
{
$_SESSION['marker'] = $server . ' - ' . time();
}
require 'aws.phar';
$s3 = new Aws\S3\S3Client(['version' => 'latest','region' => 'ap-southeast-2']);
$bucket = '331982-training';
?>
<HTML>
<Head>
<title>Simple S3 Demo</title>
</Head>
<body>
<H1><?php echo $server;?></H1>
<H3><?php echo 'Session ID: ' . $session_id;?></H3>
<H3><?php echo 'Session Marker: ' . $_SESSION['marker'];?></H3>
<HR>
<form action='upload.php' method='post' enctype='multipart/form-data'>
<input type='file' id='fileToUpload' name='fileToUpload' id='fileToUpload''>
<input type='submit' value='Upload' id='submit_button' name='submit_button'>
</form>
<?php
if (isset($_FILES["fileToUpload"]))
{
save_upload_to_s3($s3, $_FILES["fileToUpload"], $bucket);
}
echo "<p>";
$result = $s3->listObjects(array('Bucket' => $bucket));
foreach ($result['Contents'] as $object)
{
echo $object['Key'] . "<br>";
}
function save_upload_to_s3($s3_client, $uploadedFile, $s3_bucket)
{
try
{
// Upload the uploaded file to S3 bucket
$key = $uploadedFile["name"];
$s3_client->putObject(array(
'Bucket' => $s3_bucket,
'Key' => $key,
'SourceFile' => $uploadedFile["tmp_name"],
'ACL' => 'public-read'));
echo "Upload successful<br>";
} catch (S3Exception $e)
{
echo "There was an error uploading the file.<br>";
return false;
}
}
?>
<HR>
<footer>
<p align='right'>AWS Tutorials prepared by Qingye Jiang (John).</p>
</footer>
</body>
</HTML>
When you read this code, ask yourself the following questions:
-
What is a session? How do we create, retrieve, and destroy session information? You might want to read PHP 5 Sessions for some quick introduction. Also, look into the /var/lib/php/sessions folder to see what you have there.
-
What are the basic elements in a web page? You might want to read Basic HTML Elements for some quick introduction.
-
How do you upload a file to your web server? You might want to read PHP 5 File Upload for some quick introduction.
-
How do you use functions in your PHP code? You might want to read PHP 5 Functions for some quick introduction.
-
How do you catch exceptions in your PHP code? You might want to read PHP:Exceptions for some quick introduction.
-
When you click on the "Refresh" icon in your browser (after a successful upload), what happens (and why)? What is the difference between hitting ENTER in the address bar and clicking the "Refresh" icon? Check your access log and error log to see if you can find the answer.
(5) Elastic Load Balancing (ELB) Basics
Now you have a working website! This simple website allows you to upload files to your S3 bucket, and lists all the objects in your S3 bucket. Assuming that a lot of users are using your website, and a single EC2 instance is not capable of handling that workload. Naturally we think of the horizontal scaling solution - deploy multiple EC2 instances running the same application, and use an ELB to distribute the workload to the EC2 instances.
-
Create an AMI from the EC2 instance
-
Launch two more EC2 instances using the newly created AMI.
-
Create an ELB (using a Classic Load Balancer) and add all three EC2 instances to the ELB. What is the purpose of specifying a security group for your ELB?
-
Access your ELB via web browser. Test the upload web page multiple times and observe the IP address and session information on the web page. Is the session information the same as the session information you obtained from IP-based tests? Again, look into the /var/lib/php/sessions folder (on all three EC2 instances) to see what you have there. Do you see any problem with the session information?
-
Enable ELB Sticky Sessions. Test the upload web page multiple times and observe the behavior. What are the pros and cons of enabling sticky sessions?
-
Enable ELB Access Logs. Download and read the ELB access log to understand the log entries. What are the benefit of enabling ELB access logs?
-
add the following sleep() function at the beginning of your PHP code. Observe the behavior of of your web page, using the ELB CloudWatch metrics, latency in particular. Change the number of seconds in the sleep() function from 0 to 100, then observe how this affects your HTTPCode_Backend_4XX and HTTPCode_Backend_5XX metrics. Refer to List of HTTP status codes to understand the meaning of the HTTP status codes you observed.
sleep(10);
-
When you make changes to your PHP code, how do you make the changes propagated to all the EC2 instances behind the ELB?
-
Generate a very large volume of traffic to your ELB (over 5000 requests per second for over 10 minutes). You should try different approaches including curl, wget, ab, and other tools that you can think of. Write a short report on how you use these tools and your experiences with these tools. Look into your ELB CloudWatch metrics to see how fast you can go.
-
Observe the CPU, disk I/O and memory consumption on your EC2 instances while performing the load tests. You might need the following monitoring scripts.
(6) Summary and Homework
In this session, we interact with S3 using the AWS console, AWS CLI, and the AWS SDK for PHP. Also, we start using the ELB to perform load balancing for a simple PHP website.
-
Write a bash script to list all the objects in your S3 bucket with over 1000 objects.
-
Write a Java program to delete all the test objects in your S3 bucket (without deleting the bucket itself).
-
In our PHP upload page, we allow the upload of all file types. Modify the code to limit the end user to upload image files only? Also, instead of showing "Upload successful", show the image that is uploaded.
-
How do you resolve the session synchronization issue on different EC2 instances (if not enabling sticky sessions on the ELB)?