Section 1: Intro to Amazon Web Services, Identity & Access Management (IAM), and Billing - Green-Biome-Institute/AWS GitHub Wiki

Go back to tutorial overview

Now that we have completed the command line training modules and have the basics of the command line interface down, we are going to take a look at the Amazon Web Services.

Learning Points for Section 1: What is Amazon Web Services? (Identity and Access Management and Finances)

From this section, you should take away:

  1. Cloud computing services are computation and storage resources that can be used pay-as-you-go through the internet.
  2. Amazon Web Services (AWS) is Amazon’s cloud computing platform.
  3. Amazon Web Services has many different types of services available on it like storage (S3) and computational power (EC2).
  4. Identity and Access Management (IAM) is the AWS service used to manage access to services and resources on AWS.
  5. Within one AWS account, there can be many users.
  6. Each user has specific permissions that determine which services they can use.
  7. Costs associated with used resources are monitored.
  8. Resources can be tagged with labels and information about those labels for organization and cost tracking.
  9. Resources are paid for as you go, meaning every second of usage of any service used will be billed.
  10. You can find the costs associated with any given resource through links on the GBI AWS Github page. (You can also google to find this information as well.)

Let’s go all the way to the most surface level aspect of this training module.

What is cloud computing?

(Also sometimes just referred to reverently as “The Cloud”)

Cloud computing can be thought of as a fancy name for a bunch of computer components (think harddrive storage or Central Processing Units [CPUs]) that you are physically separated from but can connect to and use via the internet.

Why is this important? What does having a bunch of computer systems off in the middle of nowhere help us with?

As you can appreciate, working with genetic information inherently means you will be handling and analyzing large data files. Sometimes the softwares you will be using need to be running for hours (or days), or your computer will actually not have enough computing power to even run these programs efficiently. You might be stuck without having access to your university systems computer cluster (also a group of computers but generally these are local and owned by your organization as opposed to cloud computing resources being remote and owned by someone else).

What this means is that if you use a cloud computing service (like Amazon, as we’ll get to next), you can get access to these computer components without having to manage them yourselves. Instead of buying another computer (or a hundred of them!) you can simply “rent” out that computing power when you need it. Therefore you don’t need the upfront capital to buy the hardware (the physical computers) nor to have someone manage that hardware for you (making sure things are connected and running without errors). The company that hosts the cloud computing services you are using will manage the physical computer systems themselves.

Not only that, you can do it on-demand! Have a deadline coming up that requires 10 softwares to run at once? Rent (and pay for) 10 virtual computers! It’s 1AM and you can’t sleep because all you want to do is run data analysis but you can’t use the lab computer until the morning? Rent (and pay for) computing power in the cloud and log into it from your personal laptop! The point being, you use the services when you need them and then when you stop, those physical computers can get rented out by other people when they need them.

What is Amazon Web Services?

Amazon Web Services is Amazon’s cloud computing service.

Amazon Web Services is a pay-as-you-go cloud-computing platform hosted by Amazon. It offers a large variety of services that can be separated generally into storage, computing, and networking. With database centers around the world, it has effectively unlimited on-demand compute and memory capabilities. AWS consists of a series of services with dedicated utilities. In general, to create a platform on AWS for your own purpose, you will use a series of these services together. The following services (IAM, EBS, EC2, etc.) are the ones most applicable to our purpose. How does AWS work for us? Within Amazon Web Services, people or organizations can create accounts. We have an account that is created and managed by Melis at the GBI. This can be thought of as a container that gives access to and holds information about the services we want to use. By entering the container, you gain access to all the premade services within it (like S3) and all the resources that we create within it. If you found an important personal use for AWS, you could create a personal AWS account, paying for your own services, which can be as broad as hosting a website, data analysis, or cloud storage (instead of using an external harddrive).

Before moving on, let’s log into the AWS GBI account with the login and password provided to you.

Let’s just look at a couple things before moving on. On the AWS homepage we have:

  • a search bar at the top for finding services and information we want to use
  • the services we have recently used and
  • information regarding our account in the top right.

Something important to note on the top right is the “region”. Ours is “Oregon”. Click on this (don’t select anything) and you will see a dropdown menu with lots of other names. These are the locations of the physical data centers that host amazon web services. This is relevant to the speed at which you can connect to those datacenters (the further away, the slower). Why is this important to us? When you use AWS services, they are hosted on physical space in one of these data centers. So if you start using a service in the “Oregon” region, and then you accidentally click on another region, you won’t be able to find that service anymore. It will still be active (and therefore being paid for!), but you won’t be able to see it until you move back to the correct region it is in. For now, all of our services will be hosted in the Oregon region, so make sure you are in that when using AWS.

Identity and Access Management (IAM)

The first AWS service we need to know about is Identity and Access Management (IAM). This service is what allows for the creation of users and groups of users within one main account. It can also coordinate multiple different accounts, but for our purposes one is sufficient.

Why is this necessary? Why can’t everyone just log onto one main account? We need to organize each individual using the account because it gives the administrators of the GBI AWS account (your professors and the admin at the university) the ability to control what services you can use and how much money you can spend. As previously mentioned, AWS is a pay-as-you-go environment. This is synonymous with pay for everything you use! The services you will be using may just cost cents per hour, but they may also go up as high as $5 or $10 per hour. As you can appreciate, these kinds of services can create massive bills rather quickly, even if you aren’t actively using them. What this means is if you turn on a $10/hour service and then forget about it and come back 5 days later, the account will be charged with a $1200 bill! (And you may no longer have the funds to run your experiment analysis.)

So, when you are given user credentials, the professor will be able to specify the permissions that determine which types of services you can use, which will be specific to what you need to do. They can also monitor those costs as we will see later.

Let’s look at some examples of this. Click on the search bar at the top of the AWS website and type “IAM”. Select the first option under “Services: called “IAM: Manage access to AWS resources”.

We can see right off the bat some information about the GBI account. We can see how many current user groups and users there are. “User Groups” are groups of users that are all given the same type of permissions. For example, everyone in a specific group might be given permissions to create and use EC2 instances and to store data in S3. This combination of permissions is called a “role”. “Users” are like sub-accounts. They have their own login information that logs them into the account with the permissions granted by the role assigned to that user by the administrator.

The one main thing to check here are the users. Under “Access Management”, click on “Users”. You’ll see a small list of users, one of which is the user that you all have logged in with. All of these users operate within the one GBI account, and therefore resources created by every single one of these users is paid for by this one account.

The Finances of AWS

We won’t dive too deep into this. There are just two main points you need to be aware of and understand.

First, a practice relevant to the finances of AWS is called “resource tagging.” This is really important. All resources have the ability to be “tagged”. What this means is that it creates a label and information about that label that can be used to track that resource. For our purposes, this will be used to track very specific costs of the GBI account. We will have specific tags that you must use when you create certain resources in order to make sure we can track them correctly.

When you are given user credentials, that user is considered a resource and has already been tagged by the administrator. Therefore all costs associated with your user will be able to be recorded, monitored, and stopped if errant.

Second, and building off of this, is AWS’s ability to track costs associated with accounts. Because their service is pay-as-you-go, there is fairly extreme transparency (down to the seconds or minutes of usage) for any given type of resource. While this is more relevant to the administrators of the AWS account, it is important for you to realize this because you are responsible for your usage of the AWS account. There is an allocated budget for the account itself, so by overspending (even if it is an accident), you will be using money dedicated to the entire group.

Before we dive into the actual services themselves, I want to make you aware of the costs associated with them. As I’ve mentioned fairly redundantly, all AWS resources have costs associated with their usage. There is a “Free Tier” for many types of services (like up to 5GB of storage on S3 or the most basic EC2 instances), however most of our usage will fall outside of this, within the paid services. Let’s look at the costs associated with some of the main services we’ll be using:

For EC2 instances, click the following link: https://aws.amazon.com/ec2/pricing/on-demand/

You don’t need to know the specifics here yet. Just scroll down a bit and you can see that there are several different categories and types of information regarding the price. The price is associated with the region, operating system, “instance type”, “Memory”, and “vCPU”. We’ll go further into these names later, but you can see here that the larger the service gets (the higher the numbers under “vCPU” and “Memory”) the more it costs. Basically: larger and better services cost more (makes sense right?) For one example, look at the very first EC2 instance option, it is called “a1.medium”. It costs $0.0255 per hour. That’s 2.5 cents per hour (that’s about 61 cents for the day)! Doesn’t seem too bad. Try typing “r5” into the search bar right above that instance and EC2 instances starting with “r5” will appear. You can see that the largest one here costs $6.048 per hour. So for this EC2 instance, it will cost $145.15.

Next we’ll look at storage costs:

Using Simple Storage Service (S3), which can be thought of as an external harddrive, for the first 50TB used by one account (that means collectively within the GBI AWS account), the cost for storage is $0.023 per gigabyte-month. One gigabyte-month is the usage of one gigabyte of storage for one month. So a 100GB data file would cost 100 x 0.023 = .3 per month. There are other forms of storage that are cheaper that might be considered for long-term storage of our data.

For Elastic Block Storage (EBS), which can be thought of as the harddrive of your computer (directly accessible by your computer), the type we will use (“gp2”) costs $0.08 per gigabyte per month. That means that one GB of storage used for 30 days costs $0.08. Some basic calculations off this:

100 GB per month = $8.00 per day = $0.27 per hour = $0.011

500 GB per month = $40.00 per day = $1.33 per hour = $0.066

1 TB per month = $80.00 per day = $2.67 per hour = $0.111

Review Questions:

What are cloud computing services?

  • They are computation and storage resources that can be used pay-as-you-go through the internet.

What is Amazon Web Services?

  • Amazon Web Services (AWS) is Amazon’s cloud computing platform.

What are AWS services?

  • AWS services allow users to interact with and use different aspects of Amazons cloud service. For example S3 for storage and EC2 for computational work.

What is Identity and Access Management?

  • Identity and Access Management (IAM) is the AWS service used to manage access to services and resources on AWS.

What is the relationship between AWS account and AWS user?

  • Within one AWS account, there can be many users.

What are user permissions?

  • Each user has specific permissions that determine which services they can use.

How are costs monitored?

  • Costs associated with used resources are monitored. Resources are paid for as you go, meaning every second of usage of any service used will be billed.

What is resource tagging?

  • Resources can be tagged with labels and information about those labels for organization and cost tracking.

Where can you find information regarding the costs of any resource before you use it?

  • You can find the costs associated with any given resource through links on the GBI AWS Github page. (You can also google to find this information as well.)

Move on to Section 2 Elastic Cloud Computing (EC2) and Elastic Block Store (EBS)

Go back to tutorial overview