Genome Assessment - heelsplitter/Grootmyers_EPP_531_Applied_Genome_Analytics GitHub Wiki
6. BUSCO Analysis BUSCO Slides
... get code grom Harleen ...
mkdir Busco cd Busco
Symbolically link the fasta file to the current directory
... get code grom Harleen ...
ln -s .
Check if BUSCO is loaded by singualirity
singularity exec -B $PWD /sphinx_local/images/ezlabgva-busco-v5.6.1_cv1.img busco --help
This worked.
Run BUSCO
... get code grom Harleen ...
singularity exec -B $PWD /sphinx_local/images/ezlabgva-busco-v5.6.1_cv1.img busco -i Genome.fasta -m genome -l embryophyta -c 5 -o busco_results
7. Remove Mitochondria and Chloroplast genome from the assembly
screen -S Chloro
cd /pickett_sphinx/projects/EPP531_AGA/dgrootmy
mkdir MT_CP_removal
cd MT_CP_removal
source ~/.bashrc
spack load minimap2
Map your assembly to Chloroplast Genome
cp /pickett_sphinx/projects/EPP531_AGA/hkaur3/busco/Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.fasta .
cp /pickett_sphinx/projects/EPP531_AGA/dgrootmy/Sassafras_albidum_chloroplast/Sassafras_albidum_chloroplast_complete_genome_GenBank_MW696799_1.fasta .
minimap2 -t 5 -x asm5 Sassafras_albidum_chloroplast_complete_genome_GenBank_MW696799_1.fasta Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.fasta > Alignment.paf
Copy the following python scripts to the current directory
cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Busco/find_scaffolds_by_paf_coverage.py .
cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Busco/remove_contigs_by_name.py .
Find the list of mapping scaffolds
conda create -n python python=3.9
conda activate python
conda install pandas
nano find_scaffolds_by_paf_coverage.py
python3 find_scaffolds_by_paf_coverage.py Alignment.paf > Alignment_list.txt
Remove the mapping contigs
spack load py-biopython
spack load /2kcwn4f
python remove_contigs_by_name.py Alignment_list.txt Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.fasta mv Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.fasta Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg-CP_filtered.fasta
chmod 777 Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.filtered.fasta
8. Now use the filtered fasta file to remove mitochondria sequence and rerun the BUSCO.
... THIS IS GROUP HOMEWORK, Alina is doing this INPUT FOR THIS IS THE OUTPUT AFTER REMOVING CHLOROPLAST DATA ...