Section 5 : Setting up EC2 instance and Uploading Data - Green-Biome-Institute/AWS GitHub Wiki

Go back to to Section 4: Amazon Machine Images (AMI), the GBI AMI, and the GBI AWS Github as a Resource

Go back to tutorial overview

Learning Points for Section 5: Setting up our EC2 instance and Uploading Data

  1. You'll know how to create an EC2 instance from the GBI AMI
  2. You'll be able to find further information to the softwares available on the GBI AMI
  3. You'll be able to add and extend more EBS storage to your EC2 instances
  4. You'll be able to upload data to your EC2 instance in preparation for analyzing it.

Setting up the Instance

Let’s follow the same steps that we applied in the first module to create another EC2 instance, but this time we will build it using the GBI AMI.

  1. Navigate to the EC2 instance dashboard and click “Launch Instance”.
  2. On the left side bar, click “My AMIs”, then click on the blue “select” button next to the AMI called “GBI-AMI-11112021”.
  3. Click the top left drop down menu labeled “All instance families” and then select “r5”
  4. Select the top option for “r5.large”, then “Next: Configure Instance Details”.
  5. Select “Next: Add Storage”.
  6. You’ll see in the storage window that the instance already as 50 GB of storage allocated, this is the size of the AMI snapshot and therefore the minimum amount of storage you can use when building an instance from the GBI AMI. We are going to start by only using 50GB so that we can learn how to adjust the size of EBS volumes after we create EC2 instances. Select “Next: Add Tags”.
  7. Note: I understand that this step is a bit tedious, but it is very important. While the actual tags may change in the future for organization purposes, you must tag resources when you create them. Select “Add another tag” 4 times so that there are 5 tag boxes open. For each of these boxes add one of the following tags:
    • user : [Your Name]
    • instance-purpose : “AWS genome analysis and assembly walkthrough”
    • instance-cost : “STOPPED: 50GB EBS = $0.006/hr; RUNNING: r5.large = $0.126/hr + $0.006/hr = $0.0132/hr”
    • instance-size : “r5.large”
    • instance-OS : “Ubuntu Server 20.04 LTS (HVM), SSD Volume Type (x86)”
  • Select “Next: Configure Security Group”
  1. For the purposes of this walkthrough, we will just click “Select an existing security group” and then use one of the “launch-wizard-#” options. Select “Launch”
  2. This will open up the window to “Select an existing key pair or create a new key pair”. I would like everyone in the first drop down menu to select “create a new key pair”. Leave the checkbox on “RSA”. Then enter a unique key pair name and press “Download key pair” and save it to a place on your computer that you will remember. Finally, select “Launch Instances”.
  3. Navigate back to the EC2 instance dashboard and give your instance a unique identifier on the leftmost column by clicking on the notepad icon next to the empty space and typing the name in.

Great job! We’ll wait for a moment while our instances initialize and get ready for us to work with.

SSH into the Instance

Let’s open up our command line interface (or powershell for windows) and ssh into these instances. Remember from the CLI workshop that the SSH command looks like:

Using the Public IPv4 DNS:

$ ssh -i ~/PATH/TO/KEYPAIRS/keypair.pem [email protected] 

Or using the Public IPv4 address:

$ ssh -i ~/PATH/TO/KEYPAIRS/keypair.pem [email protected]

README information for softwares on the GBI AMI

Now that we are logged onto our instance we can look around and see all the software that is available for us to use here. It is all within the GBI-software directory, however most of the software is actually available anywhere on the instance. For example if you type:

abyss

And then press tab, you will see that abyss (and all of its individual modules) is available to use wherever you are on the EC2 instance. This means you don’t have to navigate into the directory to use it itself.

Note: this is not true for ALL of the softwares. Some of them required (or desired) their own particular environment, or have a sequence of softwares that is not conducive to being accessible anywhere on the EC2 instance because of naming or organization. To find out more information about ANY of the softwares on the AMI, you can either

  1. Go to the directory for the software you are curious about within ~/GBI-software/
    • For example, use cd ~/GBI-software/ABySS/ and then use ls and you will see that there is a README file that is created specifically for this software on this AMI. Using cat GBI-README-ABYSS to look at the file, you can see the path to where the software was installed, the current version installed, how to install it yourself if you need to do so somewhere else, and some instructions and information regarding the softwares usage and purpose, including relevant links. So if you can’t find out how to use a software on this AMI, try looking at this README within the given softwares directory inside of ~/GBI-software/.
  2. This README mentioned above is also available on the github at https://github.com/Green-Biome-Institute/AWS/wiki/AWS-GBI-AMI-Documentation

Adding more storage to our EC2 instance

As mentioned when setting up the EC2 instance, we didn’t add enough storage when creating it for us to do any important analysis. Let’s change that.

  1. First, on your EC2 instance in the CLI, use the command df -h. You can see your root volume has about 50GB of storage.
  2. Navigate to the EC2 instance dashboard on the AWS website.
  3. Select the EC2 instance that you just created earlier in this module.
  4. In the informational tab that opens up below the EC2 instance dashboard, select the “storage” tab.
  5. Scroll down and you will see a single EBS volume named vol- with a long chain of letters and numbers after it and 50GB under the “Volume Size (GiB)” column. Click the highlighted blue name of this volume in the first column.
  6. This will take you to the EBS dashboard, where you will select the check-box next to the volume (there should only be one viewable right now in this window with the same name that you clicked on in step 4).
  7. On the top right, click the “Actions” drop down menu and select “Modify Volume”. This will take you to a new window called “Modify volume”
  8. Within this window, you will see a box the with current EBS volume capacity of your EC2 instance. In this box, replace “50” with “100” and then press “Modify”. When prompted for confirmation, press “Modify” again.
  9. This will take a moment to process. Next go to the CLI where you have logged into your EC2 instance. Once again, use the command df -h. Wait, why isn’t there more storage? This is because even though you have allocated more storage to the EC2 instance’s EBS volume through amazon, the EC2 instance itself does not yet know that. This is the process of “extending” the new EBS volume to your EC2 instance. To do this, first we need to know the name of the EBS volume to extend.
  10. Use the command lsblk, this will list all the block devices on your EC2 instance. At the bottom will be one that looks something like
nvme0n1     259:0    0   50G  0 disk 
└─nvme0n1p1 259:1    0   50G  0 part /
  1. Using the name of the top block device nvme0n1, we will use the following command:
sudo growpart /dev/nvme0n1 1
  1. Followed by:
sudo resize2fs /dev/nvme0n1p1
  1. Now, using df -h, we can see that the available storage has increased to 100GB!

Uploading Data

Now we have enough room on our EC2 instance to upload data and work with it! You are welcome to upload your own data if you'd like, but if you don't have any handy, there is a dataset ready. This test data for this tutorial is held within an S3 bucket "gbi-batch1-data/athal_workshop_data". It is already a subsampled version of an entire dataset collected by the 1001 Genomes project, which can be found here: https://www.ncbi.nlm.nih.gov/sra/?term=SRR1946537.

In order to download it, we will first need to configure the AWS cli in our terminal. In order to do this, we first need the following information:

  1. AWS Access Key ID (this is in the .csv file you have downloaded with your user account)
  2. AWS Secret Access Key (this is the keypair you used to log into the EC2 instance. You will need to SCP this from your local computer to the EC2 instance)
  3. Default region name (this will most likely be us-west-2, but if it is not you can find it in the top right-hand corner of the aws webpage when you are logged into your account)

With this information handy, use

aws configure

Enter in the information above as it is requested (in the case of the AWS Secret Access Key, enter the PATH to where you copied it to, including the name of the key. Ex: path/to/keypair.pem)

Now you will be able to use commands like

aws s3 ls

To list the S3 objects you have stored on your account and copy to and from those S3 objects and your EC2 instance.

With this configured, we can download the data from our S3 bucket using the following command:

aws s3 cp s3://gbi-batch1-data/athal_workshop_data/* .

Review Questions:

What is an Amazon Machine Image?

  • A snapshot of an EC2 instance that can be loaded with pre-installed software.

Where do you find the pre-made Amazon Machine Images?

  • In the AMI tab on the left side of the EC2 dashboard.

Where can you find further information about the softwares available on the GBI AMI?

-_ In the README pages within the GBI AMI or at the GBI documentation page in this github._

What commands will you need to use to add and extend more EBS storage to your existing EC2 instance?

  • lsblk and df -h for checking the current storage capacity, growpart and resize2fs for extending the EBS volume size.

What command can you use to upload data from your local computer to your EC2 instance?

  • the scp command.

Move on to Section 6: Pre assembly software

Go back to tutorial overview