Basic introduction to linux - Pas-Kapli/bpp-tutorial GitHub Wiki

Tutorial

This tutorial covers basic Unix commands that are sufficient for running species delimitation methods from the command line. A more detailed introduction written by Tim Massingham is available here

Open a UNIX terminal window, click on the "Terminal" icon. A terminal window will appear with a "$" prompt, waiting for you to start entering commands:

lkaplipa:~$

pwd ls cd

To find out where you are, execute the pwd command (stands for "print working directory"):

lkaplipa:~$ pwd
/home/lkaplipa

To find out what files and folders are in your working directory execute the ls command (stands for "list" from "list directory contents"):

lkaplipa:~$ ls
Documents/ Music/ Analyses/ list.txt 

To create a new directory execute the mkdir command (stands for "make directory"):

lkaplipa:~$ mkdir linux_tutorial
lkaplipa:~$ ls 
Analyses/ Documents/ linux_tutorial/ Music/ list.txt 

To change a directory, execute the cd command (stands for "change directory"):

lkaplipa:~$ pwd
/home/lkaplipa
lkaplipa:~/linux_tutorial$ cd linux_tutorial 
/home/lkaplipa/linux_tutorial/
lkaplipa:~/linux_tutorial$ ls
lkaplipa:~/linux_tutorial$
lkaplipa:~$ cd ..                            [`cd ..` gets you to the parent directory (i.e. one directory back)]
lkaplipa:~$ pwd 
/home/lkaplipa/
lkaplipa:~$ cd linux_tutorial/
lkaplipa:~/linux_tutorial$ cd                [`cd` takes you to the home directory]
lkaplipa:~$ pwd 
/home/lkaplipa/

The ~ sign stands for your home directory so "~/linux_tutorial" == "/home/lkaplipa/linux_tutorial/"

cp mv rm rmdir

Download the file BR_cob_57ind.fasta in your home directory, and then copy it with cp to your linux_tutorial directory.

lkaplipa:~/linux_tutorial$ cp ~/BR_cob_57ind.fasta ~/linux_tutorial

or

lkaplipa:~/linux_tutorial$ cp ../BR_cob_57ind.fasta .
lkaplipa:~/linux_tutorial$ ls
BR_cob_57ind.fasta

Rename the file to "Branchiomma.fasta"

lkaplipa:~/linux_tutorial$ mv BR_cob_57ind.fasta Branchiomma.fasta
lkaplipa:~/linux_tutorial$ ls
Branchiomma.fasta

Create a copy of the "Branchiomma.fasta" file called "Branchiomma2.fasta"

lkaplipa:~/linux_tutorial$ cp Branchiomma.fasta Branchiomma2.fasta
lkaplipa:~/linux_tutorial$ ls
Branchiomma.fasta Branchiomma2.fasta

Delete one of the files with rm (stands for remove)

lkaplipa:~/linux_tutorial$ rm Branchiomma.fasta
lkaplipa:~/linux_tutorial$ ls
Branchiomma2.fasta

Task1: Create a new directory called "test", enter the "test" directory, copy the "Branchiomma2.fasta" file in "test", return to the directory "linux_tutorial". Try to delete the directory with rm or rmdir.

What is the problem? Try the following command to find the solution:

lkaplipa:~/linux_tutorial$ man rm 

cat less head tail sed grep

Display a text file in the terminal with cat:

lkaplipa:~/linux_tutorial$ cat Branchiomma.fasta
>BR_001
-------------CTTGGGGTCAAATAAGATTTTGGGGTGCCACAGTAATTACTAACCTACTATCAGCTATTCCTTATATTGGCAATTCAATTGTAGCCTGACTATGAGGCGGTTTCGCAGTAGATAACGCCACTCTTAATCGATTTTTCGTGTTCCACTTTATTTTACCATTTATTATTCTTCTCTTTACCCTAATTCACCTAATATTTTTACATAAAACAGGATCAAGAAACCCCCTTGGCCTCTCCTCTTATAATGCAACTATCCCCTTCCATCCTTATTACACTATAAAAGATCTTACAGGTGCTCTCATTAGTATCACCTTACTCTTAGTTCTAACACTAAATATCCCTAATATATTCCTAGAGCCTGACAATTTCATTCAAGCTAACCCACTAAGAACTCCCGCCCACATCAAACCA------------
>BR_002
-------------CTTGGGGTCAAATAAGATTTTGGGGTGCCACAGTAATTACTAACCTACTATCAGCTATTCCTTATATTGGCAATTCAATTGTAGCCTGACTATGAGGCGGTTTCGCAGTAGATAACGCCACTCTTAATCGATTTTTCGTGTTCCACTTTATTTTACCATTTATTATTCTTCTCTTTACCCTAATTCACCTAATATTTTTACATAAAACAGGATCAAGAAACCCCCTTGGCCTCTCCTCTTATAATGCAACTATCCCCTTCCATCCTTATTACACTATAAAAGATCTTACAGGTGCTCTCATTAGTATCACCTTACTCTTAGTTCTAACACTAAATATCCCTAATATATTCCTAGAGCCTGACAATTTCATTCAAGCTAACCCACTAAGAACTCCCGCCCACATCAAACCA------------
..............................
lkaplipa:~/linux_tutorial$

Use less to read the file and move back and forth in the file with up and down arrows. Press q to exit

lkaplipa:~/linux_tutorial$ less Branchiomma.fasta

To see only specific parts of the file use the head, tail and sed commands. For example to see the first 2 lines execute:

lkaplipa:~/linux_tutorial$ head -n 2 Branchiomma.fasta
>BR_001
-------------CTTGGGGTCAAATAAGATTTTGGGGTGCCACAGTAATTACTAACCTACTATCAGCTATTCCTTATATTGGCAATTCAATTGTAGCCTGACTATGAGGCGGTTTCGCAGTAGATAACGCCACTCTTAATCGATTTTTCGTGTTCCACTTTATTTTACCATTTATTATTCTTCTCTTTACCCTAATTCACCTAATATTTTTACATAAAACAGGATCAAGAAACCCCCTTGGCCTCTCCTCTTATAATGCAACTATCCCCTTCCATCCTTATTACACTATAAAAGATCTTACAGGTGCTCTCATTAGTATCACCTTACTCTTAGTTCTAACACTAAATATCCCTAATATATTCCTAGAGCCTGACAATTTCATTCAAGCTAACCCACTAAGAACTCCCGCCCACATCAAACCA------------

To see the last 2 lines, execute

lkaplipa:~/linux_tutorial$ tail -n 2 Branchiomma.fasta
----------------GAGGTCARATAAGATTTTGAGGTGCAACTGTTATTACTAATCTCCTTTCTGCCATCCCTTATATCGGCCAATCAATCGTAACTTGATTATGGGGGGGATTCGCAGTAGACAACGCTACCCTAAACCGATTTTTTATATTTCACTTCCTTCTTCCATTTATCCTAGCCTTCATATCCGGCCTACATCTTCTATTTCTTCATCAAACAGGCTCCAACAACCCATTAGGATTAAAGTCTACCTCCCTTATAATTCCCTTCCACCCCTACTACACAACCAAAGACCTTGTGGGAGCCCTCTTATTGATTTTCCTCCTCCTATTCCTTGCGCTCGCCTCCCCCTCGCTATTTCTTGACCCGGAAAATTTTATCCAGGCTAACCCCCTAGCTACCCCCACCCACATCAAAC--------------

To see the third line, execute:

lkaplipa:~/linux_tutorial$ sed -n "3p" Branchiomma.fasta
>BR_002

To search a file for a specific word/phrase/symbol use the command grep

lkaplipa:~/linux_tutorial$ grep ">BR_102" Branchiomma.fasta
>BR_102

> >> |

To direct the output text of a program to a file with the > or the >> symbol.

lkaplipa:~/linux_tutorial$ echo "Hello World"
Hello World
lkaplipa:~/linux_tutorial$ echo "Hello World" > test.txt
lkaplipa:~/linux_tutorial$ ls
Branchiomma.fasta test.txt
lkaplipa:~/linux_tutorial$ cat test.txt
Hello World
lkaplipa:~/linux_tutorial$ echo "Goodbye World" >> test.txt
lkaplipa:~/linux_tutorial$ cat test.txt
Hello World
Goodbye World
lkaplipa:~/linux_tutorial$ echo "Hello New World" > test.txt
Hello New World

To direct the output of one command to another use the pipe symbol |

lkaplipa:~/linux_tutorial$ grep ">" Branchiomma.fasta | wc -l
57

wc is a command that returns the number of lines (combined with -l), the number of characters (combined with -m), of words (combined with -w) etc. What is the output of the above command?

sort uniq

Download the file carabus_species.txt in your linux_tutorial directory.

check the first 10 lines

lkaplipa:~/linux_tutorial$ head -n 10 carabus_species.txt
Carabus jankowskii
Carabus jankowskii
Carabus smaragdinus
Carabus koreanus
Carabus seishinensis
Carabus semiopacus
Carabus arboreus
Carabus auronitens
Carabus taedatus
Carabus arboreus

sort the names of the file alphabetically:

lkaplipa:~/linux_tutorial$ sort carabus_species.txt | head -n 10
Carabus abbreviatus
Carabus abbreviatus
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti

find the unique names:

lkaplipa:~/linux_tutorial$ uniq carabus_species.txt | head -n 10
Carabus jankowskii
Carabus smaragdinus
Carabus koreanus
Carabus seishinensis
Carabus semiopacus
Carabus arboreus
Carabus auronitens
Carabus taedatus
Carabus arboreus
Carabus kyushuensis

Is the output what we expect it to be?

Task2: sort the species names in the carabus_species.txt, find the unique ones and write them in a new file called carabus_species_sorted_uniq.txt

Hint: What is the difference among sort -u, sort | uniq and uniq

Download and compile a software

It is very common to need to download the code of a software from a git repository and compile it in your computer. Most often each program provides specific instruction for doing that. For example we will download and compile a software called "Newick tools" a software written in C and it is meant to perform a multitude of operations on newick files and visualization of phylogenetic trees. Many of the operation were inspired by small tasks that are essential in species delimitation analyses.

Installation:

lkaplipa:~/linux_tutorial$ pwd
/home/lkaplipa/linux_tutorial/
lkaplipa:~/linux_tutorial$ git clone https://github.com/xflouris/newick-tools.git
lkaplipa:~/linux_tutorial$ cd newick-tools
lkaplipa:~/linux_tutorial$ cd src
lkaplipa:~/linux_tutorial$ make

Usage: Download this newick tree in your "linux_tutorial" directory.

 $ newick-tools/src/newick-tools --tree_file RAxML_bestTree.Branchiomma --info

Extract all the tip names of the phylogeny

 $ newick-tools/src/newick-tools --tree_file RAxML_bestTree.Branchiomma --extract_tips

Root tree

 $ newick-tools/src/newick-tools --tree_file RAxML_bestTree.Branchiomma --output_file RAxML_bestTree.Branchiomma.rooted --root BR_076,BR_018

Make tree binary (fully bifurcating)

$ newick-tools/src/newick-tools --tree_file RAxML_bestTree.Branchiomma --output_file RAxML_bestTree.Branchiomma.binary --make_binary