1_Assembly - bennestor/hakea_genome GitHub Wiki

Testing assembly programs

1. Trying several assembly programs

  • Smartdenovo with canu corrected reads (kmer 16, 17, 20, 21)
  • Canu with canu corrected reads (took too long to finish)
  • Flye with raw nanopore reads
  • Mini-asm with raw nanopore reads
  • NECAT with raw nanopore reads - the best one based on N50, BUSCO, and relative genome size.
2. NECAT assembly with uncorrected reads (does own correction)

1. Install and test NECAT v0.0.1 update 20200803 https://github.com/xiaochuanle/NECAT

2. Assemble

   #Run NECAT assembly: 3_necat.sh on zeus longq (28,120G,96h) took ~1 day 
   0_programs/NECAT/Linux-amd64/bin/necat.pl bridge hakea_config.txt
   
   #Print out statistics using seqkit 
   seqkit stats -a polished_contigs.fasta
   #Run BUSCO: busco_necat_polished.conf on magnus (24,50G,12h) - took 8.5h
   busco -l embryophyta_odb10 -c 24 -i polished_contigs.fasta -o busco_polished_contigs -m genome

polished_contigs.fasta:

num_seqs sum_len min_len avg_len max_len N50
1,925 768,992,702 530 399,476.7 11,707,852 1,049,695

BUSCO Embryophyta score: C:96.5%[S:89.6%,D:6.9%],F:2.2%,M:1.3%,n:1614 M:1.4%,n:1614

⚠️ **GitHub.com Fallback** ⚠️