Section 2 Elastic Cloud Computing (EC2) and Elastic Block Store (EBS) - Green-Biome-Institute/AWS GitHub Wiki

Go back to to Section 1: Section 1: Intro to Amazon Web Services, Identity & Access Management (IAM), and Billing

Go back to tutorial overview

Learning Points for Section 2 - Elastic Cloud Computing (EC2), Elastic Block Store (EBS), and Simple Storage Service (S3):

  1. EC2 instances are virtual computers that you can create in the AWS cloud with different sizes, efficiencies, and computational power.
  2. vCPUs are the virtual computer cores of an EC2 instance.
  3. Memory is the RAM of an EC2 instance and large amounts of it are required for certain assembly softwares.
  4. The cost of an EC2 instance scales with its memory size and vCPU number.
  5. An EBS volume is like a hard drive attached to an EC2 instance with however much storage you want it to have.
  6. When you stop an EC2 instance, it stops accumulating costs, but the EBS volume attached to it does not.
  7. An EBS volume only stops accruing costs when it is terminated.
  8. You will know how to create your own EC2 instance using the EC2 dashboard on the AWS website.
  9. You will know how to find the information relevant to that EC2 instance to log into it.

Elastic Cloud Computing (EC2)

Components of an EC2 instance

All of this content is a waste of time unless we can do interesting things with it! In order to “do things” we need computing power, and that is where Elastic Cloud Computing, EC2, instances come into play. EC2 instances are effectively private virtual computers where you can (and should) customize your CPU, memory, storage, networking, and security capacity.

Let’s define the terms that are immediately important to us: A “vCPU” is a virtual Central Processing Unit (usually referred to as your computer’s “processor”). This is the component of a computer that retrieves instructions and executes them. All you really have to know now is that more vCPUs means “more” computational power, which consequently will mean that some softwares will run faster. “Memory” (can be thought of as your computer's RAM or “Random Access Memory”) is a capacity of immediately accessible storage space that programs running on your computer can access without interacting with your harddrive. More memory means more programs can be run simultaneously. Certain programs, as is the case for us with genome assemblers, require large amounts of memory in order to run.

Instance Types and Families

There are pre-made instances that are combinations of these customizations, which fall into 1 of 5 categories:

  • General Purpose
  • Compute Optimized
  • Memory Optimized
  • Accelerated Computing
  • Storage Optimized

General Purpose has a balanced number of vCPUs (virtual computer cores), amount of RAM, harddrive storage, and network performance. Whereas Memory Optimized will have more memory compared to the others.

Within these categories, there are then different “families” of instances. The general-purpose instance is called t2.micro. In this name, t stands for the family of the instance, which in this case, just means general purpose. The 2 corresponds to the generation of that instance, the higher the number means the newer the generation (which can also mean increased cost if it is “better” than the older generations). Next the micro stands for the size of the instance.

Costs of EC2 instances

After this decision comes the decision of what size of the instance to use. Now we can look at the costs associated with the EC2 instances with a bit more detail. Let’s click on this link again: https://aws.amazon.com/ec2/pricing/on-demand/.

We can see here that “Memory”, which as I mentioned is important for the programs we will be using, is measured in GiB, which is similar to a GB (it is slightly different but unnecessary to know).

“vCPU” is measured by a number starting at 1 and going as high as 96. This number represents the number of processing “cores” that is available to that EC2 instance. Most softwares you will use have many programs that run within them. Some of these internal programs can be run simultaneously and by having more cores, the software can run those programs in parallel. This shortens the amount of time required for that software to reach completion. Each vCPU on most instance types is equal to 2 physical processor cores (meaning 2 threads).

I have personally chosen and been using the r5 family for assembly computations because it is a Memory Optimized instance, meaning it has more memory allocated to the instance than other families.

Building an EC2 instance

With this information, let’s all build an individual EC2 instance ourselves! Start by typing EC2 into the search bar at the top of your AWS page, and then selecting the first choice under Services called EC2: Virtual Servers in the Cloud. This takes you to the EC2 dashboard. Let’s look around this for a moment.

First, we can see that there is a list of EC2 instances that have already been made. These instances have the following important identifying information:

  • Name: personalized name for each instance for organizational purposes

  • Instance ID: unique identifier for each instance that is used to interact with it from within AWS (for example tracking costs or terminating that instance) or from the CLI

  • Instance State: instances can be stopped or active. If they are stopped, the instance itself will not accrue any costs, because it is not being used. (this is not true of the storage associated with it, but we will get to that in a bit.) Instance type: the family of the instance which dictates the amount of memory, vCPUs, and other relevant system information is allocated from the cloud for you to use.

  • Status checks: AWS checks the health of your EC2 instances, sometimes pointing out if there are errors or failures that have happened.

  • Availability zone: AWS region that your instance is in

  • Public IPv4 DNS: This is the IP that you will use to log into your computer or copy files to / from it

  • Security group name: This is the name of a security group you will either create yourself or choose from a list of pre-created ones that assign permissions for who can or cannot log into / interact with the EC2 instance.

  • Key name: this is the name of the keypair that is assigned to this EC2 instance, which is required like a password for logging into or interacting with the instance

  • Launch time: When that EC2 instance was first created

With this information, let’s build your own first EC2 instance!

  1. At the top right, click on the orange button labelled Launch instances.

  2. Scroll down and select Ubuntu Server 20.04 LTS (HVM), SSD Volume Type, using 64-bit (x86).

  3. We are going to use a cheap instance for this example. Select t2.micro (if it isn’t already selected). Then select Next: Configure Instance Details.

  4. We don’t have anything to configure in the instance details (and most likely won’t in the future either). Select Next: Add Storage.

  5. Here we will further explain a term I’ve been using: “EBS volumes”. These are like the hard drive of your computer. You will need enough storage to store the operating system and any software that is required, the sequencing data you have to assemble, and any room required by the assembler software itself. For our purpose, here, 8GiB is fine. But in the future you can imagine this number being substantially higher (as are the costs associated with it). For example, imagine a repetitive plant genome is ~1GB long and requires 2 TB of storage for the assembler to run. 1 TB of EBS storage costs $80/month (and therefore ~$2.6 per TB/day or $5.20 per 2TB/day). Once you stop using the instance, if you do not terminate it and unmount the storage, even if you are not doing anything with it, it will continue to accrue costs. This is because EBS blocks are a different AWS service from EC2. In simple terms, you can think of EC2 as a virtual computer and EBS as its hard drive, but you pay separately for each. And because storage cannot be “stopped” (otherwise it would be deleted!) you will not stop paying for EBS volume storage until you have completely deleted it. If you allocate 1TB of storage to your EC2 instance and only upload 50 GB of data to it, you will still pay for the full 1TB of storage. Now select Next: Add Tags.

  6. Here you add several keytags to associate the instance to yourself, your project, and to track its costs. We are still adjusting these so they work best for us, but for the time being we will use the following five keys:

  • User - example User : your name (this might be another identifier in the future)

    • For me user : Flint
  • Instance purpose (description of what you will be doing on the instance)

    • For this example use instance-purpose : AWS tutorial
  • Instance Operating System (name of the instance operating system or AMI title you are using)

    • For this example use instance-OS: Ubuntu Server 20.04 LTS (HVM), SSD Volume Type (x86)
  • Instance size

    • For this example use instance-size : t2.micro
  • Instance cost (the hourly cost of the instance when stopped and running. The mounted EBS ($0.08/1 GB per month, which can also be thought of as $0.08/31GB per hour) which will be paid whether the instance is running or stopped), and the combined amount of the running instance (find the cost for your instance here + the mounted EBS))

    • For this example use instance-cost : STOPPED: 8GB EBS = $0.064/hr; RUNNING: t2.micro = $0.0116/hr + $0.064/hr = $0.0756/hr

Now select Next: Configure Security Group

  1. For the security group, currently we will allow access from anywhere, so simply click Review and Launch. In the future we will set up a security protocol that only allows traffic from university-related resources/people.

  2. Now click Launch to launch the instance.

  3. This will open up a window to select key pairs. Click on the dropdown menu beneath Select a key pair. Use the keypair “GBI-AWS-keypair | RSA`. This is the keypair that you were given before this tutorial. You can also generate a new one in the future when you are building an EC2 instance. There is a public and a private keypair. The public keypair is stored by AWS and the private keypair is stored by the user. Key pairs are used as credentials to log into your account. You MUST have the keypair for logging into an EC2 instance, and if you create a new one and don’t download it from this page, you will NOT be able to get it again.

  4. Your instance will now start up and you can go back to the EC2 Instances dashboard to check its status. It may take a few minutes to load.

Logging into your first instance

Let's go back to the EC2 dashboard and check out the instance that was created. During the live training there will be a bunch of new ones. Each person is able to find the instance that they created by going through all the instances by checking the box next to each one, and then selecting the Tags tab. In this tab you will be able to see the five labels we used to tag each individual resource. When you find the one that you created, move your mouse to the Name column of that instance and a notepad will appear on the right side of it. Click on that notepad, write in [your-name-ec2] (for example for me it would be flint-ec2), and then press enter. Now you can see the instance label has been assigned to the instance and it is easier to identify. We always want to make our work as organized and documented as possible, it makes all the work afterwards much easier.

Let’s log into your EC2 instance!

Open up the command line interface.

Let’s remember the components of the SSH command… the following example is from Section 6 of the Command Line Tutorial:

$ ssh -i ~/path/to/keypairs/GBI-training-module-keypair.pem [email protected]

First, find the keypair GBI-AWS-keypair.pem that was sent to you and put it on your desktop. The path to it should be ~/Desktop/GBI-training-module-keypair.pem.

Scroll to the right on the EC2 dashboard and find the Public IPv4 DNS for your instance. The user you will log in with will be “Ubuntu” because we created an instance running Ubuntu as the operating system. Put those components together just like in the above SSH example and press enter to execute the command and log in! As you can see… there is nothing here! This instance is brand new and therefore doesn’t have any of the software that was available to you on the previous EC2 instance you worked on in the CLI tutorial. Every single EC2 instance that each person in this training created is a completely different virtual computer with different space in a datacenter in Oregon allocated to it.

Review Questions

What is an EC2 instance?

  • EC2 instances are virtual computers that you can create in the AWS cloud with different sizes, efficiencies, and computational power.

What is a vCPU on an EC2 instance?

  • vCPUs are the virtual computer cores of an EC2 instance.

What is Memory on an EC2 instance?

  • Memory is the RAM of an EC2 instance and large amounts of it are required for certain assembly softwares.

What does an EC2 instance’s cost scale with?

  • The cost of an EC2 instance scales with its memory size and vCPU number.

What is an EBS volume?

  • An EBS volume is like a hard drive attached to an EC2 instance with however much storage you want it to have.

When you stop an EC2 instance, what costs associated with it are still accruing until they are terminated?

  • When you stop an EC2 instance, it stops accumulating costs, but the EBS volume attached to it does not. An EBS volume only stops accruing costs when it is terminated.

Where do you go to create your own customized EC2 instance?

  • The EC2 dashboard on the AWS web page.

Where do you find the information about your EC2 instance that is used to SSH into and work on it?

  • The EC2 dashboard within that specific EC2 instance’s row.

Move on to Section 3: Simple Storage Service (S3) and the AWS CLI

Go back to tutorial overview