1_Assembly - bennestor/hakea_genome GitHub Wiki
1. Trying several assembly programs
- Smartdenovo with canu corrected reads (kmer 16, 17, 20, 21)
- Canu with canu corrected reads (took too long to finish)
- Flye with raw nanopore reads
- Mini-asm with raw nanopore reads
- NECAT with raw nanopore reads - the best one based on N50, BUSCO, and relative genome size.
1. Install and test NECAT v0.0.1 update 20200803 https://github.com/xiaochuanle/NECAT
2. Assemble
#Run NECAT assembly: 3_necat.sh on zeus longq (28,120G,96h) took ~1 day 0_programs/NECAT/Linux-amd64/bin/necat.pl bridge hakea_config.txt #Print out statistics using seqkit seqkit stats -a polished_contigs.fasta
#Run BUSCO: busco_necat_polished.conf on magnus (24,50G,12h) - took 8.5h busco -l embryophyta_odb10 -c 24 -i polished_contigs.fasta -o busco_polished_contigs -m genome
polished_contigs.fasta:
num_seqs | sum_len | min_len | avg_len | max_len | N50 |
---|---|---|---|---|---|
1,925 | 768,992,702 | 530 | 399,476.7 | 11,707,852 | 1,049,695 |
BUSCO Embryophyta score: C:96.5%[S:89.6%,D:6.9%],F:2.2%,M:1.3%,n:1614 M:1.4%,n:1614