TSS classification - Integrative-Transcriptomics/tss-prediction-comparison GitHub Wiki

From the predicted TSSs we want to find out if they have unique characteristics, such as distance to the next annotated gene or gene expression level. In order to do so we define three sets of TSS based on their genomic position in the given GFF file.

In the paper on TSSpredator, the authors presented the results of the program applied to multiple C.jejuni isolates. Here the distribution between the different TSS classes was: pTSS: 29%, sTSS: 10%, iTSS: 36%, asTSS: 48%, orphan: 2%. So it seems like the most common classes are asTSS, iTSS and pTSS. Therefore I suggest the following grouping:

  • secondary/primary TSS: Maximal 300 bp upstream of a start codon of an annotated gene. This distance is the TSSpredator default setting
  • internal TSS: found within a gene
  • antisense TSS: lies at most 100 bp upstream/downstream of the start/end of a gene to which the TSS is in antisense orientation
  • other: TSSs that didnt match the criteria of any of the groups

further characterisation of the groups:

  • expression level of the corresponding gene: you could sum up all the values of the sites that are about 10 bp downstream of a possible TSS from the wiggle files to determine the percentage with which the different groups appear.