Lab 01 - jfgout/AppliedGenomics GitHub Wiki

Applied Genomics - Lab 01 - Connecting to the supercomputer

1. Prerequisites:

1.1: VPN (Virtual Private Network)

To connect to the supercomputer, you need to be looged into the Mississippi State network. Unless you are connected using an ethernet cable on the campus, you will need first to log into the msstate virtual private network (vpn). This requires the installation of a piece of software on your computer. Instruction on how to do this are available here:

For MacOS: https://servicedesk.msstate.edu/TDClient/KB/ArticleDet?ID=1547

For Windows: https://servicedesk.msstate.edu/TDClient/KB/ArticleDet?ID=1546

Follow the instructions on the msstate service desk (links provided above) to install Cisco AnyConnect VPN and log into the msstate VPN.

1.2: SSH Client

Connection to the supercomputer is done using SSH (Secure Shell). A shell is a user interface which can be used to send command to a computer. In this first lab, you will learn the basic usage of a shell.

1.2.a) MacOS Users.

MacOS already contains all the tools needed. Launch the "Terminal" app and type: ssh -X [login]@lugh.biology.msstate.edu where [login] is to be substituted by your actual login name.

If you have difficulties locating the Terminal app, you can find some help here: https://macpaw.com/how-to/use-terminal-on-mac

1.2.b) Windows computers:

By default, Windows does not ship with an ssh/terminal app (although that might not be true of the very latest versions). Windows users should install the following software: MobaXterm (available for free here: https://mobaxterm.mobatek.net/)

After installing and launching MobaXterm, you can connect to the server with the following command line: ssh -X [login]@lugh.biology.msstate.edu where [login] is to be substituted by your actual login name.

Note that MobaXterm is not the only software available for this task. Putty and even the recent versions of Google Chrome can also be used. However, MobaXterm is the one that I would recommend.

#------

IMPORTANT: when prompted for the password, it will look like nothing is happening when you type the password (there will be no starts/dots showing hidden password entry). This is normal. Simply type (or copy/paste) your password and press Enter.

You should see the following welcome screen:

To close the connection, simply type in the shell: "exit" and press enter.

Congratulations, you have successfully completed your first shell command line operations!

1.3: Connecting without typing this long password.

A long (12+ characters) is important for security reasons, but it can be really annoying to type! You can log into the server using a key system to avoid having to type your password. For Windows users, this option was probably offered to you by MobaXterm. For MacOS users, you can follow the instructions from this website: https://www.ssh.com/ssh/keygen/ or https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2

2. First steps in the shell:

A good resource for learning the shell: http://linuxcommand.org/lc3_learning_the_shell.php

We will perform the first shell exercises locally (= on your computer, by opposition to on the supercomputer) so that you can see the results with your computer's graphical interface.

2.1: Hello world.

Let's start with the classic "Hello World" command. Type the following command in the terminal:

echo Hello World

What was the result of this command?

Now, try to type the following command in the terminal:

Hello World

What was the result of this command?

Now, press the up arrow on your keyboard, once, twice, ... This will recall the last commands typed in the terminal. Remember this shortcut, it will be extremely useful.

2.2: File system organization

Corresponding tutorial: http://linuxcommand.org/lc3_lts0020.php

Just like the file explorer (Finder on MacOS, Windows File explorer on Windows) allows you to navigate the file system, you can navigate the file system in the terminal using a few commands.

Let's start with pwd (Path to Working Directory). This command will display the current working directory. Type pwd in your terminal and observe the result.

To list the files (and folders) present in the working directory, type the command ls and observe the result.

The third command we will learn in this section is cd (change directory). Odds are your current working directory is your home directory (default) which should contain a folder named "Desktop". To move into the Desktop folder, type: cd Desktop

After changing the working directory to the Desktop, type the following command to create a folder "test" on your Desktop:

mkdir test

You should see the folder test created on your Desktop. Now, move into the test folder and create a sub-folder "sub-test". Create an empty file named "file1" with the command: touch file1 Use your computer's graphical interface to visualize the result of these commands.

To change the working directory back to the Desktop, you'll need to move up to the parent directory (you are currently in "test", which is a subfolder of Desktop). This can be done with the following command: cd ..

Removing the "test" folder and all its content (be very careful with this command, there is no "undo" when deleting files/folders in the shell): rm -rf test

Exercise: use the commands mkdir and cd to create a tree-like pattern of folders for storing movies according to their type and year of release as shown below.

2.3: Basic shell scripts

We will now connect back to the server. Once connected to the server, find what your current working directory is. Which command do you use for this? What is the result of this command?

Create an empty file named "script1.sh" with the command: touch script1.sh

Type the command ls and then ls -lh. What do you observe? Make the file "script1.sh" executable with the command chmod +x script1.sh and then type ls -lh. What do you notice?

We will now edit the file "script1.sh". For this, you can use the command gedit script1.sh. This will start a basic text editor (it might take a few seconds to start). If this command does not work (on some MacOS systems you will need to install X11/Quartz for this to work: https://support.apple.com/en-us/HT201341 and https://www.xquartz.org/). Alternatively, you can use a non-graphical text editor: nano with the following command: nano script1.sh

Using the text editor (gedit or nano), type the following text in your "script1.sh" file:

#!/bin/bash

echo "Hello World"

Save and exit. You can display the content of a text file in the terminal with the command cat. For example, here: cat script1.sh

Execute the script with the following command line: ./script1.sh What do you observe?

Working with variable: edit your "script1.sh" file to contain the following text:

#!/bin/bash

echo "$1"

"$1" is a variable, which will take the value of the first parameter passed on to the script. Let's illustrate this with an example by typing ./script1.sh [your name]

What do you observe?

The following example will illustrate what programming is great at: performing a repetitive task automatically! We will write a simply script that displays the numbers from 1 to 10. In real life, this type of algorithms can be very useful to perform tasks on lists of files for example.

Create an empty file "script-loop.sh" and make it executable. Edit the file with the following text:

#!/bin/bash

for i in {1..10}
do
      echo $i
done

Execute the script. What do you observe?

Another way to generate a series of numbers from 1 to 10 would be to use the command "seq":

#!/bin/bash

for i in $(seq 1 10) do       echo $i
done

You can also try to run the command "seq" directly in the terminal and play with the options to generate different types of sequences. For example, try to generate the following sequences (see this tutorial on how to use the command "seq": https://www.geeksforgeeks.org/seq-command-in-linux-with-examples/):

0 20 40 60 80 100

How about this one?

5 4 3 2 1

And one last (more complex):

file001 file002 file003 ... file256

Exercise: Modify the last script so that the maximum number (10 in this example) is now a variable passed on as the first argument to the script. Make sure to use the command "seq" instead of the {START..END} syntax (which, for some reason, does not accept variables for "START" and "END")

That's it for today. Next week, we'll apply what we learned today to look at DNA sequence files and start assembling a genome!