Separating chromosomes by comparison of sequencing libraries - KamilSJaron/k-mer-approaches-for-biodiversity-genomics GitHub Wiki
Many species show exciting karyotype variations - sex chromosomes differ between sexes, germ-line restricted chromosomes differ between somatic and the germ-line cells, accessory (B) chromosomes differ between different lineages or populations. We can exploit these differences to identify the chromosomes from sequencing reads.
- Lecture on YouTube: Separating chromosomes by comparison of sequencing libraries
In theory it is enough to look at the heterogametic sex for isolating the chromosmoes - in XY sex-determination species, in males both X and Y have about 1/2 coverage compared to autosomes. However, there are two downsides in using males only - first we would not be able to reliably distinguish between X-linked and Y-linked sequences and second, we might mistake sex chromosome duplications for signatures of autosomal linkage. These issues can be avoided if both libraries of sexes are compared instead, which is the most widely used approach in practice.
Here, we will show how sequencing libraries can be compared, how we can identify k-mers belonging to individual chromosomes or subgenomes, and finally how to find these k-mer in a genome assembly.
Comparing mapping based and k-mer approaches
The other, by far more common, approach to identify individual chromosomes is via read mapping. The chromosomal assignet of reads is usually generated via log normalized male to female coverage ratio. In the already linked preprint about fungus gnats, panel B shows coverage ratios of head and testes libraries mapped on a genome assembly. With manually selected thresholds, the different peaks are associated with different chromosomes
In the manuscript, the GRC reliable assignments were made by agreement of the mapping and k-mer approach. However, one can imagine other ways how to combine the approaches, for example by comparison of mapped reads that were subset by k-mers as explained in the last section.
For chromosomes that are largely similar, these approaches are not so useful, where there are nearly no single copy chromosome-specific k-mers. That happens to be the case in songbird germ-line restricted chromosomes (see Kinsella and Ruiz-Ruano et al.; 2019 or ask Paco :-)). Look also at the paper for the chromosome assignments for these cases of extremely high sequence similarity. it's been also proposed by Asalone et al. 2021 to used a modified RNA-seq pipeline to detect the sequences that are shared between the studied chromosomes (GRCs in their case) and core genomes.