Section 9: Wrapping it All Up (All Review Questions) - Green-Biome-Institute/AWS GitHub Wiki

Section 8: Checking our assembly quality and understanding our results

Go back to tutorial overview

Conclusion

You've finished the Amazon Web Services Module for Scientists! Nice work! Let's see if you can answer all of the review questions without going back to the wiki pages related to them... This time the questions don't have the answers immediately underneath. They are at the bottom of this page, so refrain from scrolling down to test yourself. Copy and paste the questions into a text document (like word) and try answering them!

Questions

  1. What are cloud computing services?

  2. What is Amazon Web Services?

  3. What are AWS services?

  4. What is Identity and Access Management?

  5. What is the relationship between AWS account and AWS user?

  6. What are user permissions?

  7. How are costs monitored?

  8. What is resource tagging?

  9. Where can you find information regarding the costs of any resource before you use it?

  10. What is an EC2 instance?

  11. What is a vCPU on an EC2 instance?

  12. What is Memory on an EC2 instance?

  13. What does an EC2 instance’s cost scale with?

  14. What is an EBS volume?

  15. When you stop an EC2 instance, what costs associated with it are still accruing until they are terminated?

  16. Where do you go to create your own customized EC2 instance?

  17. Where do you find the information about your EC2 instance that is used to SSH into and work on it?

  18. Which type of storage do we attach to EC2 instances?

  19. Which types of storage is good for long-term storage?

  20. What does the AWS CLI do?

  21. Which can you upload data to S3 buckets from, the command line or the S3 dashboard on the AWS website?

  22. What command do you use to interact with AWS from the CLI?

  23. What is an Amazon Machine Image?

  24. How do you build EC2 instances from an AMI?

  25. What is on the GBI AMI?

  26. What is on the GBI Github wiki?

  27. What is an Amazon Machine Image?

  28. Where do you find the pre-made Amazon Machine Images?

  29. Where can you find further information about the softwares available on the GBI AMI?

  30. What commands will you need to use to add and extend more EBS storage to your existing EC2 instance?

  31. What command can you use to upload data from your local computer to your EC2 instance?

  32. What do you use Jellyfish for?

  33. What is the BLAST command and what do you use BLAST for?

  34. What is MEGAN used for?

  35. What is FastQC used for?

  36. What does it mean to "trim" sequencing data?

  37. What command do you use to create a new virtual terminal and how do you log out of it?

  38. What is the command used to run the ABySS assembly program?

  39. What does SOAPdenovo2 require to run, which ABySS does not?

  40. How do you run a SOAPdenovo2 assembly?

  41. What is the command for running QUAST?

  42. Where are the QUAST result files and how do you read them?

  43. What is a BUSCO?

  44. How do you list all of the current BUSCO lineages?

  45. How do you use the BUSCO program?

  46. How do you download the result files to your local computer?

Answers

  1. They are computation and storage resources that can be used pay-as-you-go through the internet.

  2. Amazon Web Services (AWS) is Amazon’s cloud computing platform.

  3. AWS services allow users to interact with and use different aspects of Amazons cloud service. For example S3 for storage and EC2 for computational work.

  4. Identity and Access Management (IAM) is the AWS service used to manage access to services and resources on AWS.

  5. Within one AWS account, there can be many users.

  6. Each user has specific permissions that determine which services they can use.

  7. Costs associated with used resources are monitored. Resources are paid for as you go, meaning every second of usage of any service used will be billed.

  8. Resources can be tagged with labels and information about those labels for organization and cost tracking.

  9. You can find the costs associated with any given resource through links on the GBI AWS Github page. (You can also google to find this information as well.)

  10. EC2 instances are virtual computers that you can create in the AWS cloud with different sizes, efficiencies, and computational power.

  11. vCPUs are the virtual computer cores of an EC2 instance.

  12. Memory is the RAM of an EC2 instance and large amounts of it are required for certain assembly softwares.

  13. The cost of an EC2 instance scales with its memory size and vCPU number.

  14. An EBS volume is like a hard drive attached to an EC2 instance with however much storage you want it to have.

  15. When you stop an EC2 instance, it stops accumulating costs, but the EBS volume attached to it does not. An EBS volume only stops accruing costs when it is terminated.

  16. The EC2 dashboard on the AWS web page.

  17. The EC2 dashboard within that specific EC2 instance’s row.

  18. EBS volumes.

  19. S3 and S3 Glacier

  20. It allows us to interact with AWS services like our S3 buckets and EC2 instances from the command line.

  21. Both!

  22. aws

  23. A snapshot of an EC2 instance that contains an operating system and pre-installed software/data.

  24. When selecting the type of operating system for your EC2 instance during the first step of the normal "Launch EC2 Instance" process, instead of choosing an OS, you select the correct AMI you want from the "AMI" tab.

  25. Over 60 bioinformatics softwares related to managing and analyzing sequencing data!

  26. Information regarding the GBI AMI and documentation regarding the important softwares most relevant to our work as well as for AWS itself!

  27. A snapshot of an EC2 instance that can be loaded with pre-installed software.

  28. In the AMI tab on the left side of the EC2 dashboard.

  29. In the README pages within the GBI AMI or at the GBI documentation page in this github.

  30. lsblk and df -h for checking the current storage capacity, growpart and resize2fs for extending the EBS volume size.

  31. the scp command.

  32. Fast, memory-efficient counting of k-mers in DNA

  33. blastn, it is used to compare nucleotide or protein sequences to a sequence database.

  34. MEGAN is used to analyze the taxonomic and functional content of large microbiome datasets.

  35. FastQC is used to look at NGS data and get quality control metrics.

  36. Sequence trimming is used to remove contaminating sequences from your data, for example adapter sequences on the end of NGS data.

  37. tmux new -s [session-name]. To log out of that terminal you press the 'control' button and then the letter 'B' button at the same time, and then release them. After releasing those buttons, press the letter 'D' button.

  38. abyss-pe name=[assembly-name] j=[num-threads] v=-v k=53 in="fastq_1.fastq fastq_2.fastq" | tee filename.log

  39. a configuration file

  40. soapdenovo2-127mer all -s /path/to/config-file -K [k-mer-value] -p [num-threads] -o /path/to/assembly/directory/assembly-output-filename 1>assembly.log 2>assembly.err

  41. quast.py -o [output-name] -e -k --k-mer-size [k-mer-size-input] [input-data]

  42. They are inside of the output folder that was generated with the name you gave [output-name] and to read them, you use the cat command on the report.txt files.

  43. A Benchmarking Universal Single-Copy Ortholog is a highly conserved gene that is expected to be in a specific clade.

  44. busco --list-datasets

  45. Using the BUSCO command: busco -i [input-sequence.file] -l [busco-lineage] -o [output-prefix] -m [busco-analysis-mode] -c [num-cores]

  46. Using the SCP command: scp -ri /path/to/keypairs/keypair.pem [email protected]:~/path/to/results_folder_name local/path/to/results_folder

Go back to tutorial overview