linux - Pas-Kapli/tutorials GitHub Wiki

Introduction

Many of the species delimitation tools and auxiliary tools (alignment, phylogenetic inference) are available in easy to use web-services. However, the standalone versions of the software used in a Unix environment further allow to:

  1. Exploit the full spectrum of software options
  2. Handle large files
  3. Perform the analyses more efficiently

Tutorial

This tutorial covers basic Unix commands that are sufficient for running species delimitation methods from the command line.

Open a UNIX terminal window, click on the "Terminal" icon. A terminal window will appear with a "$" prompt, waiting for you to start entering commands:

lkaplipa:~$

pwd ls cd

To find out where you are, execute the pwd command (stands for "print working directory"):

lkaplipa:~$ pwd
/home/lkaplipa

To find out what files and folders are in your working directory execute the ls command (stands for "list" from "list directory contents"):

lkaplipa:~$ ls
Documents/ Music/ Analyses/ list.txt 

To create a new directory execute the mkdir command (stands for "make directory"):

lkaplipa:~$ mkdir linux_tutorial
lkaplipa:~$ ls 
Analyses/ Documents/ linux_tutorial/ Music/ list.txt 

To change a directory, execute the cd command (stands for "change directory"):

lkaplipa:~$ pwd
/home/lkaplipa
lkaplipa:~/linux_tutorial$ cd linux_tutorial 
/home/lkaplipa/linux_tutorial/
lkaplipa:~/linux_tutorial$ ls
lkaplipa:~/linux_tutorial$
lkaplipa:~$ cd ..                            [`cd ..` gets you to the parent directory (i.e. one directory back)]
lkaplipa:~$ pwd 
/home/lkaplipa/
lkaplipa:~$ cd linux_tutorial/
lkaplipa:~/linux_tutorial$ cd                [`cd` takes you to the home directory]
lkaplipa:~$ pwd 
/home/lkaplipa/

The ~ sign stands for your home directory so "~/linux_tutorial" == "/home/lkaplipa/linux_tutorial/"

cp mv rm rmdir

Download the file BR_cob_57ind.fasta in your home directory, and then copy it with cp to your linux_tutorial directory.

lkaplipa:~/linux_tutorial$ cp ~/BR_cob_57ind.fasta ~/linux_tutorial

or

lkaplipa:~/linux_tutorial$ cp ../BR_cob_57ind.fasta .
lkaplipa:~/linux_tutorial$ ls
BR_cob_57ind.fasta

Rename the file to "Branchiomma.fasta"

lkaplipa:~/linux_tutorial$ mv BR_cob_57ind.fasta Branchiomma.fasta
lkaplipa:~/linux_tutorial$ ls
Branchiomma.fasta

Create a copy of the "Branchiomma.fasta" file called "Branchiomma2.fasta"

lkaplipa:~/linux_tutorial$ cp Branchiomma.fasta Branchiomma2.fasta
lkaplipa:~/linux_tutorial$ ls
Branchiomma.fasta Branchiomma2.fasta

Delete one of the files with rm (stands for remove)

lkaplipa:~/linux_tutorial$ rm Branchiomma.fasta
lkaplipa:~/linux_tutorial$ ls
Branchiomma2.fasta

Task1: Create a new directory called "test", enter the "test" directory, copy the "Branchiomma2.fasta" file in "test", return to the directory "linux_tutorial". Try to delete the directory with rm or rmdir.

What is the problem? Try the following command to find the solution:

lkaplipa:~/linux_tutorial$ man rm 

Task2: In your home directory create the "workshop_exercises" folder and its sub-folder as follows:

tree

Advanced tip: Check the -p option in the mkdir man page.

cat less head tail sed grep

Display a text file in the terminal with cat:

lkaplipa:~/linux_tutorial$ cat Branchiomma.fasta
>BR_001
-------------CTTGGGGTCAAATAAGATTTTGGGGTGCCACAGTAATTACTAACCTACTATCAGCTATTCCTTATATTGGCAATTCAATTGTAGCCTGACTATGAGGCGGTTTCGCAGTAGATAACGCCACTCTTAATCGATTTTTCGTGTTCCACTTTATTTTACCATTTATTATTCTTCTCTTTACCCTAATTCACCTAATATTTTTACATAAAACAGGATCAAGAAACCCCCTTGGCCTCTCCTCTTATAATGCAACTATCCCCTTCCATCCTTATTACACTATAAAAGATCTTACAGGTGCTCTCATTAGTATCACCTTACTCTTAGTTCTAACACTAAATATCCCTAATATATTCCTAGAGCCTGACAATTTCATTCAAGCTAACCCACTAAGAACTCCCGCCCACATCAAACCA------------
>BR_002
-------------CTTGGGGTCAAATAAGATTTTGGGGTGCCACAGTAATTACTAACCTACTATCAGCTATTCCTTATATTGGCAATTCAATTGTAGCCTGACTATGAGGCGGTTTCGCAGTAGATAACGCCACTCTTAATCGATTTTTCGTGTTCCACTTTATTTTACCATTTATTATTCTTCTCTTTACCCTAATTCACCTAATATTTTTACATAAAACAGGATCAAGAAACCCCCTTGGCCTCTCCTCTTATAATGCAACTATCCCCTTCCATCCTTATTACACTATAAAAGATCTTACAGGTGCTCTCATTAGTATCACCTTACTCTTAGTTCTAACACTAAATATCCCTAATATATTCCTAGAGCCTGACAATTTCATTCAAGCTAACCCACTAAGAACTCCCGCCCACATCAAACCA------------
..............................
lkaplipa:~/linux_tutorial$

Use less to read the file and move back and forth in the file with up and down arrows. Press q to exit

lkaplipa:~/linux_tutorial$ less Branchiomma.fasta

To see only specific parts of the file use the head, tail and sed commands. For example to see the first 2 lines execute:

lkaplipa:~/linux_tutorial$ head -n 2 Branchiomma.fasta
>BR_001
-------------CTTGGGGTCAAATAAGATTTTGGGGTGCCACAGTAATTACTAACCTACTATCAGCTATTCCTTATATTGGCAATTCAATTGTAGCCTGACTATGAGGCGGTTTCGCAGTAGATAACGCCACTCTTAATCGATTTTTCGTGTTCCACTTTATTTTACCATTTATTATTCTTCTCTTTACCCTAATTCACCTAATATTTTTACATAAAACAGGATCAAGAAACCCCCTTGGCCTCTCCTCTTATAATGCAACTATCCCCTTCCATCCTTATTACACTATAAAAGATCTTACAGGTGCTCTCATTAGTATCACCTTACTCTTAGTTCTAACACTAAATATCCCTAATATATTCCTAGAGCCTGACAATTTCATTCAAGCTAACCCACTAAGAACTCCCGCCCACATCAAACCA------------

To see the last 2 lines, execute

lkaplipa:~/linux_tutorial$ tail -n 2 Branchiomma.fasta
----------------GAGGTCARATAAGATTTTGAGGTGCAACTGTTATTACTAATCTCCTTTCTGCCATCCCTTATATCGGCCAATCAATCGTAACTTGATTATGGGGGGGATTCGCAGTAGACAACGCTACCCTAAACCGATTTTTTATATTTCACTTCCTTCTTCCATTTATCCTAGCCTTCATATCCGGCCTACATCTTCTATTTCTTCATCAAACAGGCTCCAACAACCCATTAGGATTAAAGTCTACCTCCCTTATAATTCCCTTCCACCCCTACTACACAACCAAAGACCTTGTGGGAGCCCTCTTATTGATTTTCCTCCTCCTATTCCTTGCGCTCGCCTCCCCCTCGCTATTTCTTGACCCGGAAAATTTTATCCAGGCTAACCCCCTAGCTACCCCCACCCACATCAAAC--------------

To see the third line, execute:

lkaplipa:~/linux_tutorial$ sed -n "3p" Branchiomma.fasta
>BR_002

To search a file for a specific word/phrase/symbol use the command grep

lkaplipa:~/linux_tutorial$ grep ">BR_102" Branchiomma.fasta
>BR_102

> >> |

To direct the output text of a program to a file with the > or the >> symbol.

lkaplipa:~/linux_tutorial$ echo "Hello World"
Hello World
lkaplipa:~/linux_tutorial$ echo "Hello World" > test.txt
lkaplipa:~/linux_tutorial$ ls
Branchiomma.fasta test.txt
lkaplipa:~/linux_tutorial$ cat test.txt
Hello World
lkaplipa:~/linux_tutorial$ echo "Goodbye World" >> test.txt
lkaplipa:~/linux_tutorial$ cat test.txt
Hello World
Goodbye World
lkaplipa:~/linux_tutorial$ echo "Hello New World" > test.txt
Hello New World

To direct the output of one command to another use the pipe symbol |

lkaplipa:~/linux_tutorial$ grep ">" Branchiomma.fasta | wc -l
57

wc is a command that returns the number of lines (combined with -l), the number of characters (combined with -m), of words (combined with -w) etc. What is the output of the above command?

sort uniq

Download the file carabus_species.txt in your linux_tutorial directory.

check the first 10 lines

lkaplipa:~/linux_tutorial$ head -n 10 carabus_species.txt
Carabus jankowskii
Carabus jankowskii
Carabus smaragdinus
Carabus koreanus
Carabus seishinensis
Carabus semiopacus
Carabus arboreus
Carabus auronitens
Carabus taedatus
Carabus arboreus

sort the names of the file alphabetically:

lkaplipa:~/linux_tutorial$ sort carabus_species.txt | head -n 10
Carabus abbreviatus
Carabus abbreviatus
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti
Carabus albrechti

find the unique names:

lkaplipa:~/linux_tutorial$ uniq carabus_species.txt | head -n 10
Carabus jankowskii
Carabus smaragdinus
Carabus koreanus
Carabus seishinensis
Carabus semiopacus
Carabus arboreus
Carabus auronitens
Carabus taedatus
Carabus arboreus
Carabus kyushuensis

Is the output what we expect it to be?

Task3: sort the species names in the carabus_species.txt, find the unique ones and write them in a new file called carabus_species_sorted_uniq.txt

Hint: What is the difference among sort -u, sort | uniq and uniq

Download and compile a software

It is very common to need to download the code of a software from a git repository and compile it in your computer. Most often each program provides specific instruction for doing that. For example we will download and compile a software called "Newick tools" a software written in C and it is meant to perform a multitude of operations on newick files and visualization of phylogenetic trees. Many of the operation were inspired by small tasks that are essential in species delimitation analyses.

Installation:

lkaplipa:~/linux_tutorial$ pwd
/home/lkaplipa/linux_tutorial/
lkaplipa:~/linux_tutorial$ git clone https://github.com/xflouris/newick-tools.git
lkaplipa:~/linux_tutorial$ cd newick-tools
lkaplipa:~/linux_tutorial$ cd src
lkaplipa:~/linux_tutorial$ make

Usage: Download this newick tree in your "linux_tutorial" directory.

 $ newick-tools/src/newick-tools --tree_file RAxML_bestTree.Branchiomma --info

Extract all the tip names of the phylogeny

 $ newick-tools/src/newick-tools --tree_file RAxML_bestTree.Branchiomma --extract_tips

Root tree

 $ newick-tools/src/newick-tools --tree_file RAxML_bestTree.Branchiomma --output_file RAxML_bestTree.Branchiomma.rooted --root BR_076,BR_018

Make tree binary (fully bifurcating)

$ newick-tools/src/newick-tools --tree_file RAxML_bestTree.Branchiomma --output_file RAxML_bestTree.Branchiomma.binary --make_binary