February 2022 - Bozhie/transcription-modeling GitHub Wiki
Running Questions for Elphege et al
- are these technical or biological replicates? --> biological (found in paper)
- assigning TSS to genes when using gene aggregation in DESeq2 to detect DEGs
2.23.2022
Finished summary of heatmaps:
summary of different workflows
2.23.2022
Finished
choosing "best" TSS for aggregated genes / DESeq2 TSS mapping
- ensembl suggests using tags http://mart.ensembl.org/info/genome/genebuild/transcript_quality_tags.html
- these tags are not downloadable using biomaRt,
- actually didn't notice any of gene in the mouse gene set with the "canonical" tag, so would have to rely on the other ones
- from page describing how the canonical transcript is assigned
"For everything, if required, the final disambiguation step is the lowest stable ID number (i.e. the oldest)."
- spot-checking to compare the tags for mouse genes with duplicated transcripts, it seems like this method is okay, unless we want to 1. figure out perl API (which I'm not sure it would work) or 2. consider
- for the genes with a lot of isoforms, the tags usually have multiple (2-3) possible dominant transcripts, so doesn't specify the "best". At this resolution, choosing one of these three seems okay.
=====
Notes to fill in
- add link to google doc
- how do I run kallisto/sleuth
resources to add:
-
resources for choosing ensembl type
-
bioconductor/biomaRt walkthroughs -- details for choosing archived ensembl files
-
cool resource with theory: "PH525x series - Biomedical Data Science" http://genomicsclass.github.io/book/ and http://rafalab.github.io/pages/harvardx.html
-
DESeq2 vignette https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions
-
R for data science https://r4ds.had.co.nz/index.html
-
snakemake file
-
understanding wald test vs LRT: https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqhow-are-the-likelihood-ratio-wald-and-lagrange-multiplier-score-tests-different-andor-similar/
looking forward
- add notes from command-line scripts / using snakemake
- how do I get the standard deviation of fragment length?