Section 1: Introduction - Green-Biome-Institute/AWS GitHub Wiki

Learning Points for Section 1 - Introduction

From this section you should take away that:

The command line interface is another way of interacting and using your computer, much like the graphical interface you move around on with your mouse.
Using the command line interface is a practical and important skill for furthering a career in many scientific fields.
You can navigate around your computer, open up, edit, and execute programs and files using the CLI with written commands instead of clicking on them with your mouse.
You can make computations more customized and automated/efficient by using the CLI.
The up arrow can be used to review past commands.
It’s not hard! Anyone can do this with a bit of practice.

What is the command line?

The command line interface (also referred to as the CLI, command line, or command prompt) is a method for interfacing with your computer, much like you do with your mouse or touchpad. Instead of using clicks (like when you double click on a document or file to open it), you will type commands and enter them. When entered they are read and reacted to by a program called the shell. Just like opening up a finder or file manager, the shell can navigate around the directories of your computer (the file-organizers, a folder is an example of a "directory" which contains other files or more folders), move files around, copy and paste them, edit them, etc. So when you control the shell from the command line interface, you control the computer from it as well!

Why is it a practical skill that you want to pick up?

While there are many softwares that have graphical user interfaces (a visual representation of the software with buttons, images, etc.), many must be operated from the command line. Therefore in order to actually use the breadth of open-source software that is out there, you must be familiar with using the command line.

Gaining a familiarity of the command line will give you a better appreciation for how processes run on your computer and will allow you to look at science problems with a new perspective. Even just the simple understanding of knowing how to navigate around your computer using the command line (save, open, and edit files, connect with other computers to send and receive information from them, etc.) is fairly fundamental to most fields of science that require analysis of lots of data. On this topic I think it is important to note that many scientists have gone on to work in careers that use this kind of skill every day, some for basic tools and others who have become data scientists. Awareness of these subjects can open you up to the ideas of many other fields that are within your reach of trying out!

Not only will you be able to better visualize the pipeline from "I produced a bunch of data!" to "Look what my data says!", you will also be able to troubleshoot issues that arise on the way. Those issues might come in the form of trying to find out where a software saved an important file or piece of information or through the writing of a simple piece of code to make your experiments more efficient (this would be called writing a "shell script", which can just be one line!).

It is not hard!

A final note before jumping into the CLI itself: it is not hard! The fundamentals of the command line and bash scripting are entirely accessible to anyone who wants to learn them. If you get stuck or don’t understand, ask questions to the people around you or to your professors and even if they don’t know the answer, they will likely be able to find a way to find the answer. We will go through methods for finding further information about using and understanding CLI commands. Also, there is a ton of information on the internet that can be helpful (though I will also warn, can also introduce quite a bit of confusion as well).

The CLI vs. the graphical representation of your computer (your desktop)

In general, when you open your computer you are viewing the desktop, where there are folders and files or anything else that you've put there. You are able to drag, drop, and edit these items like applications or files. This is the graphical representation of your computer. It is the method engineers have designed computers so that we can see what we are doing.

The command line is the same thing as this visual representation of the computer (where you can see the folders and files on your desktop), except it is represented in a window that only has text. What this means is that when we are using the command line interface, instead of clicking on or dragging and dropping items, we are able to write lines of words to the computer, telling it what to do. When we submit these lines of words, called commands, the computer will process them and respond by trying to do the action it was told to do. This is just like clicking on a folder: you enter the folder and get to see the contents within it. In the command line we can use a command for the same folder on your desktop and enter it to view the contents of it, change the name of it, move it, or edit it in other ways. The command line is just another way to interact with the computer.

This is important because many of the programs we will be using can only be used from the command line and not from the graphical user interface of your computer. Your desktop computer (regardless of the operating system Mac, Linux, or Windows) has a program that will open up this command line interface. In Linux, it is most likely under Applications > Accessories > Terminal. By finding and clicking on it with your mouse, you will open up a CLI window. From here you can navigate around your desktop computer. This is important for two reasons:

First, this is where we will access amazon web services, through the CLI on our computers.

Second, because once we’ve logged onto AWS, we will not have the graphical desktop to view. Instead we will only be able to navigate around the AWS EC2 instance using the CLI itself. This is why it is important to get a feel for navigating using the command line, because it is how you will operate programs within the EC2 instances.

Let's jump in

Example 1

First, everyone needs to log into an EC2 instance. This means finding the command line interface on your computer and opening it up. On Windows it is called PowerShell. You will then copy the command given to you with the correct EC2 instance name. The command will look like this:

ssh -i ~/Desktop/GBI-Bioinformatics-Class.pem ubuntu@[your-IP-Address]

For the live teaching of this module, commands will be sent out at the beginning of the session. For attempting this on your own, you must contact your teacher / PI to set up an EC2 instance for you work on. They will give you a command that will log you into the EC2 instance

Comparing the terminal navigation to navigating your desktop folders. Do not worry about remembering these commands immediately, this example is simply to see how the terminal is similar to the computer you already use. Using the command:

$ cd ex1-dir

You can see that the terminal enters the folder named “ex1-dir.” Then by using:

$ ls

You will list the contents of ex1-dir. The contents will be the files separated by spaces that show up under the command you just entered. This folder has two files named count.sh and multi-variable-assembly.sh. Doing these two actions is the same as double-clicking a folder on your desktop computer, opening that folder up, and seeing the contents. Within the folder is a file named count.sh. We will use the following command to read that file:

$ cat count.sh

Which will output:

#!/bin/bash

for (( i=0; i<10; i++)); do
    echo $((i + 1))
done

This text is the contents of the file. This is similar to opening up a file in a folder on your desktop and reading whats inside! At the most basic level, we can see that the text of this file named count.sh has something to do with the numbers 0 and 10. This file is actually more than just text, it is an executable file, otherwise known as a “program.” To execute it we will use the command:

$ ./count.sh

You can see that the output prints out the numbers 1 through 10! This is all this file does, but it’s a good place to start! Executing this command is similar to double-clicking on a program you have on your computer. For example if you open up the program Microsoft Word, that application opens up and allows you to create and type on documents. In this context the “creating and type on documents” part of our simple program is just “counting from 1 to 10.”

You saw before that there is another file in this directory named multi-variable-assembly.sh. Instead of reading its contents, let’s just run it:

$ ./ex1-dir/multi-variable-assembly.sh

You can see the output here is similar to that of the “count.sh” program, except instead of just counting to 10, it is showing a progress bar as it does a fake “assembly” using the ABySS genome assembler using a different “k-mer” value (a program and variable we will get to later). You can immediately see how if you need to run a program multiple times or customize the inputs to it, this can be set up at the very beginning and done once, without having to keep coming back to your computer and changing one small variable. For example, if you needed to run a program several times in a row and it takes 2 hours to run (instead of the 1 second in this example program), then you could set it up to run overnight, and when you come back to the lab in the morning, you will have all your results!

Next we use:

$ cd ..

The terminal then exits the folder and goes back to the directory where we started this example. This example shows the similarity between using the graphical user interface of your computer and the terminal.

One final thing to note before we leave this section. You can press the “up” arrow on your keyboard to look at commands that you previously used! This is helpful for checking out what you did before, for making edits to a command you might not have entered correctly the first time or need to change for a different purpose.

Review Questions:

How does the CLI compare to the graphical user interface of your computer?

You can do many of the same things on the command line as you can with your mouse. You can open and navigate around folders, create and edit files, and use programs.

Why is the CLI a practical and important skill?

It is an important skill for many fields of science that require analyzing large datasets, for have a larger set of tools to analyze data with because many programs don't have GUIs, and for furthering your career as a scientist.

How can the command line make your experiments more efficient?

You can set up multiple data analyses to run one after another to change variables in between runs, meaning that you don't have to manually do that yourself and free up your own time.

How do you review past commands?

You press the up arrow.

Move on to Section 2: The Basics

Go back to tutorial overview