6 Polishing - coopermkr/sdepressaAssembly GitHub Wiki

Now that we have our scaffolded, phased genome we next want to polish out any errors from the assembly process. Industry standard here is to run the assembly through racon with our least error-prone reads. You want to avoid using Hi-C reads for this process because they are reduced representation and will only correct errors near their restriction enzyme cut site, leaving the rest of your assembly uncorrected. We did not have whole genome shotgun reads from our genome individual, so we opted for our PacBio CCS reads for polishing.

First, we need to align our raw reads against our reference:

path=~/data

# Align pacbio reads against n18 tetra scaffolds
minimap2 -ax map-pb -t 20 4.scaffolding/04.build/tetra.scaff.18.fasta $path/pacm56.ccs.fastq > ts18.pac.sam

Then we can provide that .sam file to Racon, which can be installed very easily here: https://github.com/lbcb-sci/racon

racon -t 20 $path/pacm56.ccs.fastq ts18.pac.sam 4.scaffolding/04.build/tetra.scaff.18.fasta > tetra.polished.fasta

Now we have a fasta file containing our error corrected scaffolds.