January 2022 - Bozhie/transcription-modeling Wiki

January Wiki

12.14.22

Getting familiar

Todo:

---> plan of action for sleuth analysis:

  1. download ensembl to refseq mapping

in /scratch/pokorny/ensembl_queries_downloads/mm9_ensembl_refseq_id.csv --> maps ensembl cdna/transcript ids to refseq transcript ids

I see two options:

  1. simply add gene name to the transcriptional mapping from kallisto (the abundance file)

  2. redo kallisto alignments using the gene set (I don't think this makes sense though)

  3. generate sleuth models for comparison of time=0 to different time points for both WT and condition

  4. how to compare WT vs dCTCF?

mySQL query to download RefSeq transcripts (used for mapping of RNAseq data) to the ensembl cDNA sequences

next steps:

12.18.2022

Mapping DE Analysis by genes

Goal: Perform some analysis of the RNAseq data that was mapped to transcripts by kallisto. Particularly: 1. collect the transcripts into genes to get Expression values by-gene 2. compare with results from Nora analysis (i.e. FPKM in table 11 vs TPM), are these both by genes? 3. Compare differential expression for WT vs dCTCF using sleuth

Process/Notes:

Wald Test vs likihood ratio test (LRT)

"Many packages, including limma, use Wald tests for two-sample comparisons. The LRT approach is a bit more elegant in that it is formally testing the relative goodness of fit of a more parameterized (i.e. including an effect of treatment) vs. a less parameterized model, instead of the null hypothesis of no difference in expression between conditions. Thus, in LRT mode, the results table output does not include a logfold change estimate. But, with a little bit of data manipulation, we can extract mean TPM values per treatment and add them to our sleuth results table so that, for transcripts with significant differential expression, we can asses the direction of the difference."