Matching k‐mers to sequencing reads - KamilSJaron/k-mer-approaches-for-biodiversity-genomics GitHub Wiki

The other option to utilize the chromosome-specific k-mers is to extract them from reads. Of course, at this point, you probably know enough about kmers to find some sort of way to match k-mers to reads too, but fortunately, for this, there is an existing tool - Cookiecutter (Thanks Rob Baird for this pointer).

Cookiecutter is a command-line tool, that allows to subset reads to those that do or don't specified k-mers.

python2.7 cookiecutter make_library -i <Y-kmers.fasta> -o <Y-kmers.lib.txt> -l <k>
python2.7 cookiecutter extract -1 <reads_R1.fq> -2 <reads_R2.fq> -f <Y-kmers.lib.txt> -o <Y_reads>

The extracted reads are those that contain at least one of the specified k-mers. The utility also allows to exclude reads that contain a specified k-mers, therefore one can subsequently subset the extracted reads to those that contain Y-linked k-mer only

python2.7 cookiecutter make_library -i <A-kmers.fasta> <X-kmers.fasta> -o <AX-kmers.lib.txt> -l <k>
python2.7 cookiecutter remove -1 <Y_reads.fq> -2 <Y_reads.fq> -f <AX-kmers.lib.txt> -o <Y_reads>

The reads can be then assembled, or mapped back to an assembly.

Matching k‐mers to sequencing reads - KamilSJaron/k-mer-approaches-for-biodiversity-genomics GitHub Wiki

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️