1 Genome Profiling - coopermkr/sdepressaAssembly GitHub Wiki

Genome profiling uses kmer counting techniques to estimate ploidy, genome size, heterozygosity, and unique sequence proportions, which can be helpful in assembling a genome.

First process the whole genome shotgun file with your favorite kmer counter. Here I'm using kmc3:

trim=FILENAME.fastq

kmc -k21 -t10 -m64 -ci1 -cs10000 -fq data/$trim trimmed.kmers outdir/

kmc_dump -ci10 -cx2300 trimmed.kmers trimmed.kmers.dump

kmc_tools transform trimmed.kmers -ci10 -cx2300 dump -s kmcdb_L10_U2300.dump

Then feed the input files first into Smudgeplot to estimate ploidy:

smudgeplot.py hetkmers -o kmer_pairs < kmcdb_L10_U2300.dump

And then into GenomeScope2.0 to estimate genome stats: http://genomescope.org/genomescope2.0/analysis.php?code=dMvo6e2u2PqXQmo4me2B