Unix IX: Attempt of murder? - bcfgothenburg/VT24 GitHub Wiki

Course: VT24 Unix applied to genomic data (SC00036)

In this exercise you will do a phylogenetic analysis to determine if there is enough evidence to support an attempt of murder. The data and the whole story can be found in this article. You will also practice how to install some software


Copy the rt.fa file from /home/courses/Unix/files, this contains HIV sequences from the RT gene (reverse transcriptase). Each sequence reflects the origin of the clone:

 - P = patient
 - V = victim
 - LA = Lafayette area (a suburb somewhere in the USA)

Q1. How many sequences of each clone origin do we have?

The first thing we need to do is to install Clustalw2, a program that will help us to align the sequences, so we can identify any differences.

  • Go to the webpage
  • Download the Standalone 64-bit Linux binary (X.X.X9 with wget
  • Follow the instructions from the website, under the Precompiled binaries
  • Run clustalo --help to display the help menu and check that the program works

To run clustalo via the command line without having to type the absolute path, open your ~/.bashrc with a text editor and add the following line (don´t forget to add the correct path):

export PATH=/home/path_to_clustalo_installation/bin:$PATH
  • Save the document and close it
  • Run this command to execute the export command you just added:
source ~/.bashrc

There are a lot of flags you can use to fine tune your analysis. This is quite common for most of the bioinformatics tools.

Run the tool with the following arguments:

* infile          multiple sequence input file 
* outfile         alignment output file
* outfmt          alignment format, use clustal
* verbose         verbose output
* guidetree-out   tree output file

Have a look at the file.

Q2. What can you conclude from this result?

To make this easier let's look at the data in a more graphic way. In this case, we will use njplot that is already installed in the server, so load the module:

 module load njplot/2.4

Open njplot in the background and load the dendogram: rt.dnd. Now you have a phylogenetic representation of the analyzed samples (how they are related genetically).

Q3. What can you conclude from this phylogenetic tree?

It is still a little difficult to interpret due to the sample names. Modify the name of the sequences in the original file (rt.fa), from:

>gi|24209939|gb|AY156734.1| HIV-1 clone P1.BCM.RT from USA reverse transcriptase (pol) gene, partial cds

to:

>P1.BCM.RT

Save the modified sequences into a file and run again clustalo. Open the corresponding dendogram in njplot.

Q4. Is there any evidence that the defendant was guilty?


Home: Unix applied to genomic data


Developed by Marcela Dávila, 2018. Modified by Marcela Dávila, 2021. Updated by Marcela Dávila, 2024.