Module 2 Lab 3: BUSCO and Removing Organellar DNA - jacksonhturner/epp_531 GitHub Wiki
BUSCO (universal single copy orthologs) is a tool used to assess assembly completeness by checking if essential genes are present within an assembly. We will do this with one of the previously assembled sassafras haplotypes.
ln -s /pickett_sphinx/projects/EPP531_AGA/turner/M2Lab2/analysis/Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta /pickett_sphinx/projects/EPP531_AGA/turner/M3Lab1/BUSCO
conda activate busco
cd /pickett_sphinx/projects/EPP531_AGA/turner/M3Lab1/BUSCO
busco -i Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta -m genome -l embryophyta -c 5 -o busco_results
Below is the output from BUSCO:
--------------------------------------------------
|Results from dataset embryophyta_odb10 |
--------------------------------------------------
|C:99.0%[S:5.1%,D:93.9%],F:0.4%,M:0.6%,n:1614 |
|1597 Complete BUSCOs (C) |
|82 Complete and single-copy BUSCOs (S) |
|1515 Complete and duplicated BUSCOs (D) |
|7 Fragmented BUSCOs (F) |
|10 Missing BUSCOs (M) |
|1614 Total BUSCO groups searched |
--------------------------------------------------
Remove organellar DNA from the assembly by mapping with minimap2. Start with Chloroplast:
spack load minimap2
minimap2 -t 5 -x asm5 S_albidum_plastome.fasta ../BUSCO/Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta > Sassafras_cp_alignment.paf
s_cp_alignment.paf
Use custom python scripts to remove matching contigs from assembly.
cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Busco/find_scaffolds_by_paf_coverage.py .
cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Busco/remove_contigs_by_name.py .
python3 find_scaffolds_by_paf_coverage.py Sassafras_cp_alignment.paf > Alignment_list.txt
python remove_contigs_by_name.py Alignment_list.txt ../BUSCO/Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta
mv ../BUSCO/Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta Sassafras_hap1_cp_removed.fasta
Do the same to remove the mitochondria.
minimap2 -t 5 -x asm20 C_camphora_mito.fasta Sassafras_hap1_cp_removed.fasta > Alignment2.paf
python3 find_scaffolds_by_paf_coverage.py Alignment2.paf > Alignment2.txt
python remove_contigs_by_name.py Alignment2.txt Sassafras_hap1_cp_removed.fasta
mv Sassafras_hap1_cp_removed.fasta Sassafras_hap1_no_organelles.fasta
Rerun BUSCO to visualize the impact of removing organellar DNA.
conda activate busco
busco -i Sassafras_hap1_no_organelles.fasta -m genome -l embryophyta -c 5 -o busco_results