Genome Assessment - heelsplitter/Grootmyers_EPP_531_Applied_Genome_Analytics GitHub Wiki

6. BUSCO Analysis BUSCO Slides

... get code grom Harleen ...

mkdir Busco cd Busco

Symbolically link the fasta file to the current directory

... get code grom Harleen ...

ln -s .

Check if BUSCO is loaded by singualirity

singularity exec -B $PWD /sphinx_local/images/ezlabgva-busco-v5.6.1_cv1.img busco --help

This worked.

Run BUSCO

... get code grom Harleen ...

singularity exec -B $PWD /sphinx_local/images/ezlabgva-busco-v5.6.1_cv1.img busco -i Genome.fasta -m genome -l embryophyta -c 5 -o busco_results

7. Remove Mitochondria and Chloroplast genome from the assembly

screen -S Chloro
cd /pickett_sphinx/projects/EPP531_AGA/dgrootmy
mkdir MT_CP_removal
cd MT_CP_removal
source ~/.bashrc
spack load minimap2

Map your assembly to Chloroplast Genome

cp /pickett_sphinx/projects/EPP531_AGA/hkaur3/busco/Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.fasta .
cp /pickett_sphinx/projects/EPP531_AGA/dgrootmy/Sassafras_albidum_chloroplast/Sassafras_albidum_chloroplast_complete_genome_GenBank_MW696799_1.fasta .
minimap2 -t 5 -x asm5 Sassafras_albidum_chloroplast_complete_genome_GenBank_MW696799_1.fasta Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.fasta > Alignment.paf

Copy the following python scripts to the current directory

cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Busco/find_scaffolds_by_paf_coverage.py .
cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Busco/remove_contigs_by_name.py .

Find the list of mapping scaffolds

conda create -n python python=3.9
conda activate python
conda install pandas
nano find_scaffolds_by_paf_coverage.py
python3 find_scaffolds_by_paf_coverage.py Alignment.paf > Alignment_list.txt

Remove the mapping contigs

spack load py-biopython
spack load /2kcwn4f
python remove_contigs_by_name.py Alignment_list.txt Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.fasta mv Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.fasta Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg-CP_filtered.fasta
chmod 777 Sassafras_V1.0_with_Hi-C_90x_1700_0.4_harleen.hic.hap1.p_ctg.filtered.fasta

8. Now use the filtered fasta file to remove mitochondria sequence and rerun the BUSCO.

... THIS IS GROUP HOMEWORK, Alina is doing this INPUT FOR THIS IS THE OUTPUT AFTER REMOVING CHLOROPLAST DATA ...