Section 3: Simple Storage Service (S3) and the AWS CLI - Green-Biome-Institute/AWS GitHub Wiki
Go back to to Section 2 Elastic Cloud Computing (EC2) and Elastic Block Store (EBS)
Learning Points for Section 3: Simple Storage Service (S3) and the AWS CLI
- The main differences between EBS, S3, and S3 Glacier storage services are their latency, relationship with other AWS services, and cost
- The AWS CLI is a set of commands that you can use (much like
ls
orcp
) that allow you to interact with AWS services from the command line (within an EC2 instance!) - You can upload information (data, pictures, plots, files, etc.) to S3 using either the AWS website or through the AWS CLI.
- To use the AWS CLI tool, you use the command
aws
followed by the service you’d like to use, likes3
, and then command you want to use with that service, likels
orcp
Simply Storage Service (S3)
The one service I expect that some of you will already have heard of is S3, even if just in passing. This is because this service is extremely widely used! If you haven’t figured it out by now, S3 stands for Simple Storage Service!
S3 is one of AWS’s most prominent storage services. It is a very “simple” method for storing information in Amazon’s cloud. Remember, just like an EC2 instance, when we say “storing in the cloud”, all we are talking about is renting storage space on a harddrive in one of Amazon’s data centers. You can think of it as unlimited storage capacity - you can store as much information / data as you want! Of course, for every byte of information you store, you are taking a bit of physical space in an Amazon data center, and therefore you must pay for it. Before we look at the pricing, let’s look at the basics of using S3.
The Structure of S3
First, let’s navigate to the AWS S3 desktop by typing “S3” into the search bar at the top of the AWS homepage. Select the first option “S3: Scalable Storage in the Cloud”.
The basic building block of S3 is called an “S3 bucket”, which is the container where we store information. You can create up to 100 buckets per account (though this number can be increased if we need) and store as much information in each bucket as you want.
For organization within each bucket, we have what are called “folders” and “prefixes”. Folders are very similar to what you are probably already thinking: they are used to group objects and organize files. The difference from S3 folders and the folders on your desktop, however, is that S3 folders don’t “contain” the files you have grouped with them. Instead, they simply “point” to those files. You can think of an S3 bucket as one big directory with all the files you put into it. Say you have 10 files, where files 1 through 5 are related to project A and files 6 through 10 are in project B. You can create 2 folders called projectA
and projectB
, and have them point to files 1-5 and 6-10, respectively. However, you do not have to “enter” projectA
or projectB
to access the files within it, like you would for a folder on your computer desktop.
Next we have what are called “prefixes”. Prefixes are just like the PATH that we learned in the command line tutorial. Say we want to work with the file file1
grouped by the folder projectA
within our bucket named GBIbucket
. The prefix would be: GBIbucket/projectA/file1
!
Lastly, when we store files in a bucket (for example a fastq or fast5 file!) it is called an “object”. Objects can be as large as 5TB (5,000 gigabytes). You can store as many objects as you want in an S3 bucket.
Building an S3 Bucket
Let’s put pen to paper and try this out. First we’ll each create a bucket.
- Select the orange button with white words that says “Create bucket”. This takes you to a new page.
- For the bucket name, enter your
[your-name]sbucket
- for example for me I would type
flintsbucket
- For the AWS Region, select
US West (Oregon) us-west-2
- Leave “Block all public access” selected.
- In the Tags section create the 2 following tags:
- Key
user
: Value[your-name]
(for example for me:user
:flint
) - Key
bucket-purpose
: Valuemy first bucket
- Now select
Create bucket
at the end!
That’s it! Going back to the S3 dashboard, you will see your bucket listed under “Buckets”.
Storing items in an S3 Bucket
Now that we have created our own bucket, let’s store something in it. To do this:
- Click on the bucket with your name.
- Click the orange “upload” button.
- Click “Add files”.
- This will open up a window to select the file you want to upload. Go to the file that you were sent before this training module and select it. Once selected, press “open”.
- Now you can press the orange “Upload” button at the bottom of the screen.
- You will see a progress bar at the top of the screen. When it turns from blue to green, the upload is successful.
- Press “close” at the top right and you will be redirected back to your bucket.
Now, you can see that within your bucket, there is one object. That object is the file that you just uploaded! In order to delete it (don’t do this yet), you check the box next to the object and press the “Delete” button.
This is the basics of S3. However, what we’ve done here only allows for you to upload and download from AWS’s webpage. The next tool we’re going to learn will allow us to interact with these S3 buckets from an EC2 instance.
The AWS CLI
The AWS CLI is a tool provided by AWS that allows for you to interact with AWS services from the command line (instead of from the AWS webpage). This means many of the things we’ve learned here today (including creating EC2 instances and building / storing things in S3 buckets), can be done entirely from the command line! The only use I think is directly practical to learn here is the ability to interact with S3 buckets from an EC2 instance. To start, let’s go back to our CLI where we are logged into the EC2 instance we created earlier.
If you closed this window, just open up another one and use the same command you used earlier to log in.
As we saw earlier, there’s nothing here! Pretty useless to have a computer with nothing on it, I’d say. Let’s change that by using the AWS CLI.
First, since this is a new EC2 instance, it does not have all of the software necessary to use the AWS CLI tool. Enter the following command:
$ sudo apt update && sudo apt upgrade -y && sudo apt install awscli
Something I should note before we move forward is that, just like you need permission to use the AWS services like EC2 and S3 (you get this permission by logging into the webpage), the EC2 instance itself also needs permission. When we created our EC2 instances, we gave them the IAM role GBI-EC2-S3-ReadOnly
. I only bring this up so that you can be aware that if you aren’t being allowed to do something on AWS and an error message tells you you don’t have “credentials”, it is likely that the service you are using needs a role like the one we gave our EC2 instance in order to complete that task. There is some magic happening behind the scenes here, but don’t worry too much about it.
Now that we have the AWS CLI downloaded and we know that our EC2 instance has permission to read from S3 because of its IAM role, let’s use the AWS CLI!
The AWS CLI command is:
$ aws
You’ll see by entering this you get an error message because this command prompt needs more information to know what to do. The syntax, in general, is to use the aws
command followed by the name of the AWS service you want to use, like s3
, and then the command to execute with that AWS service. For example, let’s list our S3 buckets:
$ aws s3 ls
Breaking this command down we are
- Using
aws
command - Telling the
aws
command to interact with thes3
service - Telling the
aws
command to use thels
command with thes3
service
The output of this shows all the buckets currently in the GBI account that you have access to.
Using the AWS CLI to read from S3 Buckets
Now, let’s go a step further and list the contents of our specific bucket!:
$ aws s3 ls [your-name]sbucket
For example for me: $ aws s3 ls flintsbucket
Now, you will see the file that we uploaded just a couple minutes ago!
Next, let’s copy that file onto our EC2 instance using the cp
command. To do this, you must add “s3://” before the bucketname, to signify which specific bucket you are talking about:
$ aws s3 cp s3://[your-name]sbucket/[filename] .
Remember, the .
character signifies your current location on the EC2 instance, which right now is your home directory.
Now, list the contents of your home directory:
$ ls
And you’ll see that the file from our S3 bucket has been downloaded to our EC2 instance!
Deleting S3 Objects and Buckets
Lastly, let’s go in and delete that bucket so that it doesn’t accrue anymore costs. Remember, each second that information is stored will be billed! We could do this step from the CLI if we had given the EC2 instance both read and write privileges. Instead, we’ll do it by navigating back to the S3 desktop on the AWS web page. Before we delete a bucket, we must delete the contents inside of it. So click on your bucket, then on box next to the file, and press “Delete” to delete the file. This will take you to a new window where you must enter “permanently delete” before you can select the “Delete” option. Now, go back to the S3 dashboard and select the circle next to your bucket and press “Delete”. This will take you to another new window where you must type the name of your bucket before deleting it. Enter the name of your bucket and then press “Delete bucket”.
Good work!
Review Questions
Which type of storage do we attach to EC2 instances?
- EBS volumes.
Which types of storage is good for long-term storage?
- S3 and S3 Glacier
What does the AWS CLI do?
- It allows us to interact with AWS services like our S3 buckets and EC2 instances from the command line.
Which can you upload data to S3 buckets from, the command line or the S3 dashboard on the AWS website?
- Both!
What command do you use to interact with AWS from the CLI?
aws
Move on to Section 4: Amazon Machine Images (AMI), the GBI AMI, and the GBI AWS Github as a Resource