Lab_2 - heelsplitter/Grootmyers_EPP_531_Applied_Genome_Analytics GitHub Wiki

Introduction of Command Line (2)

  1. Download the Arabidopsis Genome. https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas.gz
wget https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas.gz
  1. Unzip/Decompress the file.
gunzip TAIR10_chr_all.fas.gz
  1. See What the genome looks like.
less TAIR10_chr_all.fas
  1. Count the number of chromosomes.
grep -o 'CHROMOSOME' TAIR10_chr_all.fas | wc -l
  1. Download the Protein Sequences. https://www.arabidopsis.org/download_files/Sequences/Araport11_blastsets/Araport11_pep_20220914.gz

wget https://www.arabidopsis.org/download_files/Sequences/Araport11_blastsets/Araport11_pep_20220914.gz
gunzip Araport11_pep_20220914.gz
grep -o '>' Araport11_pep_20220914 | wc -l

Introduction to Blast.

  1. Loading the program "Blast" Download the Zebrafish protein sequences.
curl -o zebrafish.1.protein.faa.gz -L https://osf.io/68mgf/download
gunzip zebrafish.1.protein.faa.gz
  1. Make database for blast.
export SPACK_ROOT=/pickett_shared/spack
PATH=$PATH:$HOME/bin:$SPACK_ROOT/bin
. $SPACK_ROOT/share/spack/setup-env.sh
spack list blast
spack load blast-plus
makeblastdb -in zebrafish.1.protein.faa -dbtype prot
  1. Run blast to compare "mgProteome.fasta" peptide sequence to Zebrafish database.
blastp -query ../Commandline_Lab/Data/mgProteome.fasta -db zebrafish.1.protein.faa -out zebravsMG.txt
  1. Discuss the results.

Some proteins were good BLAST hits for subject proteins (>40%) but mostly with relatively high e values.

  1. Run the same commands using program "Diamond"
spack load [email protected]%[email protected]
diamond makedb --in zebrafish.1.protein.faa -d diamonddb
diamond blastp -d diamonddb -q ../Commandline_Lab/Data/mgProteome.fasta -o matches.tsv