February 2022 - Bozhie/transcription-modeling GitHub Wiki

2.23.2022

ensembl suggests using tags http://mart.ensembl.org/info/genome/genebuild/transcript_quality_tags.html
- these tags are not downloadable using biomaRt,
- actually didn't notice any of gene in the mouse gene set with the "canonical" tag, so would have to rely on the other ones
from page describing how the canonical transcript is assigned "For everything, if required, the final disambiguation step is the lowest stable ID number (i.e. the oldest)."
- spot-checking to compare the tags for mouse genes with duplicated transcripts, it seems like this method is okay, unless we want to 1. figure out perl API (which I'm not sure it would work) or 2. consider
- for the genes with a lot of isoforms, the tags usually have multiple (2-3) possible dominant transcripts, so doesn't specify the "best". At this resolution, choosing one of these three seems okay.

=====

resources to add:

add notes from command-line scripts / using snakemake
- how do I get the standard deviation of fragment length?