Module 2 Lab 3: BUSCO and Removing Organellar DNA - jacksonhturner/epp_531 GitHub Wiki

BUSCO (universal single copy orthologs) is a tool used to assess assembly completeness by checking if essential genes are present within an assembly. We will do this with one of the previously assembled sassafras haplotypes.

ln -s /pickett_sphinx/projects/EPP531_AGA/turner/M2Lab2/analysis/Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta /pickett_sphinx/projects/EPP531_AGA/turner/M3Lab1/BUSCO

conda activate busco

cd /pickett_sphinx/projects/EPP531_AGA/turner/M3Lab1/BUSCO

busco -i Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta -m genome -l embryophyta -c 5 -o busco_results

Below is the output from BUSCO:

        --------------------------------------------------
        |Results from dataset embryophyta_odb10           |
        --------------------------------------------------
        |C:99.0%[S:5.1%,D:93.9%],F:0.4%,M:0.6%,n:1614     |
        |1597   Complete BUSCOs (C)                       |
        |82     Complete and single-copy BUSCOs (S)       |
        |1515   Complete and duplicated BUSCOs (D)        |
        |7      Fragmented BUSCOs (F)                     |
        |10     Missing BUSCOs (M)                        |
        |1614   Total BUSCO groups searched               |
        --------------------------------------------------

Remove organellar DNA from the assembly by mapping with minimap2. Start with Chloroplast:

spack load minimap2

minimap2 -t 5 -x asm5 S_albidum_plastome.fasta ../BUSCO/Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta > Sassafras_cp_alignment.paf
s_cp_alignment.paf

Use custom python scripts to remove matching contigs from assembly.

cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Busco/find_scaffolds_by_paf_coverage.py .
cp /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/Busco/remove_contigs_by_name.py .

python3 find_scaffolds_by_paf_coverage.py Sassafras_cp_alignment.paf > Alignment_list.txt
python remove_contigs_by_name.py Alignment_list.txt ../BUSCO/Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta
mv ../BUSCO/Sassafras_V1.0_with_Hi-C_0.5.hic.hap1.p_ctg.fasta Sassafras_hap1_cp_removed.fasta

Do the same to remove the mitochondria.

minimap2 -t 5 -x asm20 C_camphora_mito.fasta Sassafras_hap1_cp_removed.fasta > Alignment2.paf
python3 find_scaffolds_by_paf_coverage.py Alignment2.paf > Alignment2.txt
python remove_contigs_by_name.py Alignment2.txt Sassafras_hap1_cp_removed.fasta
mv Sassafras_hap1_cp_removed.fasta Sassafras_hap1_no_organelles.fasta

Rerun BUSCO to visualize the impact of removing organellar DNA.

conda activate busco
busco -i Sassafras_hap1_no_organelles.fasta -m genome -l embryophyta -c 5 -o busco_results