March 2022 - Bozhie/transcription-modeling GitHub Wiki

3.10.2022

Questions and explorations of RNAseq workflows

Snakemake

**remaining issue: **

options:

  • write own frankenstein snakemake with essentials of current workflow
  • try to download a local version/location of each of the repos that are pulled from wrappers in the downloaded version

First exploration of Nib1 DEGs

Overview/goal: Inspecting the data FASTQ EA18.1 untreated and 24h

Process:

  • Concatenated both lanes for forward fastq files and reverse fastq files prior to running through kallisto.
    • attempted to do so programmatically with bash script: /project/fudenber_735/collaborations/karissa_2022/RNAseq-mapped/kallistoquant-rnaseq-all.job
    • idea/note: if snakemake is not the right tool, or even if it is, this full step could probably be accomplished with a combo of python scripting and bash scripting. Using the sample.csv file format in the specs for the snakemake/config.yml
    • instead, did concatenation manually first then ran job: /project/fudenber_735/collaborations/karissa_2022/RNAseq-mapped/kallistoquant-all.job
  • In meeting, karissa was talking about cleanup steps prior to her workflow with STAR. It looks like it's probably unnecessary with modern aligners for our purpose: https://dnatech.genomecenter.ucdavis.edu/faqs/when-should-i-trim-my-illumina-reads-and-how-should-i-do-it/
    • is generating .bigwigs the genome annotation? : "However, if the data are used for variant analyses, genome annotation or genome or transcriptome assembly purposes, we recommend read trimming, including both, adapter and quality trimming."
  • Mapped to transcriptome with kallisto, manually/like last time
    • note: options to generate .bams if we supply .gtf file. I can try this (not sure if this is sufficient for Karissa/Elphege, or if they need full bigwigs)
  • DESeq2 for DEGs, using ~condition
  • Re-generated DEGs for dCTCF by using the transcripts mapped to mm10 instead, so that these two datasets are more directly comparable

3.22.22

Comparing Abundances of DE Genes in dNipBl

Overview/Goal: Digging a little into Karissa's RNAseq method, and comparing STAR results and abundances with kallisto. Also, doing some quick DE Analysis for 24 hour time mark.

Choosing corresponding downloads for genome annotations

Purpose: Finding the older releases that are most accurate (typically, most up-to-date/newest release that contains the genome for the species of interest) is not always

The releases on this main page have content

Notes:

  • All archived files from ensembl databases available via FTP:
  • This page lists the https://m.ensembl.org/info/website/archives/index.html
  • Moving forward, using:
    genome genome assembly ensembl release release date GENCODE version
    mm9 NCBI37 54 May 2009 N/A
    mm10 GRCm38.p6 102 Nov 2020 M23
    GRCm39 GRCm39 105 (Dec 2021) M27

3.29.22

Major updates:

(put summary of git pushes)

enhancer liftover