Lab_2 - heelsplitter/Grootmyers_EPP_531_Applied_Genome_Analytics GitHub Wiki
Introduction of Command Line (2)
- Download the Arabidopsis Genome. https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas.gz
wget https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas.gz
- Unzip/Decompress the file.
gunzip TAIR10_chr_all.fas.gz
- See What the genome looks like.
less TAIR10_chr_all.fas
- Count the number of chromosomes.
grep -o 'CHROMOSOME' TAIR10_chr_all.fas | wc -l
-
Download the Protein Sequences. https://www.arabidopsis.org/download_files/Sequences/Araport11_blastsets/Araport11_pep_20220914.gz
wget https://www.arabidopsis.org/download_files/Sequences/Araport11_blastsets/Araport11_pep_20220914.gz
gunzip Araport11_pep_20220914.gz
grep -o '>' Araport11_pep_20220914 | wc -l
Introduction to Blast.
- Loading the program "Blast" Download the Zebrafish protein sequences.
curl -o zebrafish.1.protein.faa.gz -L https://osf.io/68mgf/download
gunzip zebrafish.1.protein.faa.gz
- Make database for blast.
export SPACK_ROOT=/pickett_shared/spack
PATH=$PATH:$HOME/bin:$SPACK_ROOT/bin
. $SPACK_ROOT/share/spack/setup-env.sh
spack list blast
spack load blast-plus
makeblastdb -in zebrafish.1.protein.faa -dbtype prot
- Run blast to compare "mgProteome.fasta" peptide sequence to Zebrafish database.
blastp -query ../Commandline_Lab/Data/mgProteome.fasta -db zebrafish.1.protein.faa -out zebravsMG.txt
- Discuss the results.
Some proteins were good BLAST hits for subject proteins (>40%) but mostly with relatively high e values.
- Run the same commands using program "Diamond"
spack load [email protected]%[email protected]
diamond makedb --in zebrafish.1.protein.faa -d diamonddb
diamond blastp -d diamonddb -q ../Commandline_Lab/Data/mgProteome.fasta -o matches.tsv