Unix IX: Attempt of murder? - bcfgothenburg/VT24 GitHub Wiki
Course: VT24 Unix applied to genomic data (SC00036)
In this exercise you will do a phylogenetic analysis to determine if there is enough evidence to support an attempt of murder. The data and the whole story can be found in this article. You will also practice how to install some software
Copy the rt.fa file from /home/courses/Unix/files, this contains HIV sequences from the RT gene (reverse
transcriptase). Each sequence reflects the origin of the clone:
- P = patient
- V = victim
- LA = Lafayette area (a suburb somewhere in the USA)
Q1. How many sequences of each clone origin do we have?
The first thing we need to do is to install Clustalw2, a program that will help us to align the sequences, so we can identify any differences.
- Go to the webpage
- Download the
Standalone 64-bit Linux binary (X.X.X9withwget - Follow the instructions from the website, under the
Precompiled binaries - Run
clustalo --helpto display the help menu and check that the program works
To run clustalo via the command line without having to type the absolute path, open your ~/.bashrc with a text editor and add the following line (don´t forget to add the correct path):
export PATH=/home/path_to_clustalo_installation/bin:$PATH
- Save the document and close it
- Run this command to execute the export command you just added:
source ~/.bashrc
There are a lot of flags you can use to fine tune your analysis. This is quite common for most of the bioinformatics tools.
Run the tool with the following arguments:
* infile multiple sequence input file
* outfile alignment output file
* outfmt alignment format, use clustal
* verbose verbose output
* guidetree-out tree output file
Have a look at the file.
Q2. What can you conclude from this result?
To make this easier let's look at the data in a more graphic way. In this case, we will use njplot that is already installed in the server, so load the module:
module load njplot/2.4
Open njplot in the background and load the dendogram: rt.dnd. Now you have a phylogenetic representation of the analyzed samples (how they are related genetically).
Q3. What can you conclude from this phylogenetic tree?
It is still a little difficult to interpret due to the sample names. Modify the name of the sequences in the original file (rt.fa), from:
>gi|24209939|gb|AY156734.1| HIV-1 clone P1.BCM.RT from USA reverse transcriptase (pol) gene, partial cds
to:
>P1.BCM.RT
Save the modified sequences into a file and run again clustalo. Open the corresponding dendogram in njplot.
Q4. Is there any evidence that the defendant was guilty?
Home: Unix applied to genomic data
Developed by Marcela Dávila, 2018. Modified by Marcela Dávila, 2021. Updated by Marcela Dávila, 2024.