AssemblyQualityAssessmentandProteinPrediction - BGIGPD/BestPractices4Pathogenomics GitHub Wiki
Assembly Quality Assessment and Protein Prediction
Why We Need Assembly Quality Assessment
- Understand the quality of data obtained from assembling with MEGAHIT.
- Do not overly trust your tools; fully understand your data.
How to Know Our Contigs Data
BUSCO
- Install BUSCO using conda:
conda activate envname
conda install -c conda-forge -c bioconda busco=5.3.2
busco -i genome.fa -c 10 -o outputdir -m geno/prot/tran -l refdatabase_path --offline
QUAST
- Install QUAST using conda:
conda activate envname
conda install quast
quast.py contigs.fas
- More sophisticated usage:
quast.py contigs_1.fa contigs_2.fa -r reference.fa -g genome.gff -1 reads1.fastq.gz -2 reads2.fastq.gz -o quast_out -t 12
- QUAST can run without a reference genome, but there will be no gene alignment information.
getorf
- Install EMBOSS to use getorf:
conda install -c bioconda emboss
getorf -minsize 600 -sequence input.fna -outseq output.faa
getorf
extracts Open Reading Frames (ORFs) from a nucleotide sequence and translates them into protein sequences.