Genome Annotation - avince10/vincent_EPP531 GitHub Wiki
Input Data: Redbud Genome ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Syri/Redbud_Genome_Hap2.fasta .
- Building Database
Load the right Perl
spack load /ajwoixl
/pickett_shared/software/RepeatModeler-2.0.3/BuildDatabase -name Redbud -engine ncbi Redbud_Genome_Hap2.fasta 2. RepeatModeler /pickett_shared/software/RepeatModeler-2.0.3/RepeatModeler -pa 3 -engine ncbi -database Redbud 2>&1 | tee 00_Redbud_repeatmodeler.log 3. Merge All the repeat libraries cat /pickett_shared/software/RepeatMasker/Libraries/eudicotyledons-rm.fa /pickett_shared/software/RepeatMasker/Libraries/RMRB.fasta Path_to/Redbud-families.fa > Redbud_totalRepeatLib.fa 4. RepeatMasker #Mask our genome
/pickett_shared/software/RepeatMasker/RepeatMasker
-lib Redbud_totalRepeatLib.fa
-e rmblast
-pa 3
-nolow
-xsmall
-gff
Redbud_Genome_Hap2.fasta \
& Redbud_1.0.0_RMasker.out
- Download RNAseq Data from NCBI Make a SRR-accession list .txt file
nano srr_accessions.txt SRR957672 SRR1909126 SRR1909127 ctrl x enter
Now lets download the data from NCBI
spack load sratoolkit
for i in $(cat srr_accessions.txt); do prefetch $i && fasterq-dump $i done Merge all the fastq files in their respective pairs and compress them.
cat SRR1909126.fastq SRR1909127.fastq SRR957672.fastq > redbudmerged.fastq
gzip redbudmerged.fastq
- STAR index the masked genome Copy/softlink your masked genome to current directory
spack load star
STAR
--runMode genomeGenerate
--genomeDir Hap1
--genomeSAindexNbases 13
--genomeFastaFiles Redbud_Ragtag_Salsa_Hap2.masked.fasta
--runThreadN 3
- STAR Mapping RNAseq Data
STAR
--genomeDir Hap1
--readFilesIn Redbud_rnaseq_1.fastq.gz Redbud_rnaseq_2.fastq.gz
--readFilesCommand zcat
--outFileNamePrefix Redbud_Hap1-rna_
--outSAMtype BAM SortedByCoordinate
--outSAMstrandField intronMotif
--limitBAMsortRAM 107374182400
--runThreadN 10 \
& star_hap1.out
- BRAKER Input files for BRAKER3 cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Syri/Redbud_Ragtag_Salsa_Hap2.masked.fasta . cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Syri/Redbud_Hap1-rna_Aligned.sortedByCoord.out.bam .
Download the orthoDB protein database for plants.
wget https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11/Viridiplantae.fa.gz
#gunzip gunzip -d Viridiplantae.fa.gz
Set the path for BRAKER and AUGUSTUS config files
export BRAKER_SIF=/sphinx_local/images/braker3_latest.sif export AUGUSTUS_CONFIG_PATH=/home/avince10/miniconda3/envs/busco/config echo $AUGUSTUS_CONFIG_PATH
Set path for AUGUSTUS config file in singularity interactive shell singularity shell -B $PWD $BRAKER_SIF export AUGUSTUS_CONFIG_PATH=/home/avince10/miniconda3/envs/busco/config echo $AUGUSTUS_CONFIG_PATH
#Exit the interactive shell
Ctrl + D
Make a new directory
mkdir braker_hap1
Script for running BBRAKER
singularity exec -B $PWD /sphinx_local/images/braker3_latest.sif braker.pl --genome=Redbud_Ragtag_Salsa_Hap2.masked.fasta
--bam=Redbud_Hap1-rna_Aligned.sortedByCoord.out.bam
--prot_seq=Viridiplantae.fa
--workingdir=braker_hap1
--threads 5
--useexisting
--gff3
--AUGUSTUS_CONFIG_PATH $AUGUSTUS_CONFIG_PATH
--species=Ccanadensis
Check the stats on gff3 file
cat braker.gff3 | awk '{a[$3]++}END{for(k in a){print k,a[k]}}'
Homework Rub BUSCO on the protein fasta file.
- EnTAP The Eukaryotic Non-Model Transcriptome Annotation Pipeline (EnTAP) is designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. EnTAP Documentation
Rename the BRAKER protein file mv braker.aa Ccanadensis_protein_hap1.fasta Softlink the protein file to EnTAP directory ln -s path_to/Ccanadensis_protein_hap1.fasta .
ln -s/pickett_sphinx/projects/EPP531_AGA/avince10/braker/braker_hap1/braker_hap1/Ccanadensis_protein_hap1.fasta
Load the required dependencies
spack load diamond
spack load diamond @2.0.4
spack load rsem
spack load interproscan
spack load transdecoder
Run EnTAP
/sphinx_local/software/EnTAP-1.0.0/bin/EnTAP
--runP
-i Ccanadensis_protein_hap1.fasta
--ini /sphinx_local/software/EnTAP-1.0.0/entap_config_Oct2023.ini
-d /sphinx_local/software/EnTAP-1.0.0/bin/uniprot_sprot.dmnd
-t 5