March 2022 - Bozhie/transcription-modeling GitHub Wiki

3.10.2022

Questions and explorations of RNAseq workflows

Snakemake

I was able to get local version of https://github.com/snakemake-workflows/rna-seq-kallisto-sleuth/
Able to create all conda envs locally to be used for future: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#using-and-combining-pre-exising-workflows Conda deployment also works well for offline or air-gapped environments. Running snakemake --use-conda --conda-create-envs-only will only install the required conda environments without running the full workflow. Subsequent runs with --use-conda will make use of the local environments without requiring internet access.

**remaining issue: **

Many of the steps use other snakemake wrappers: https://snakemake-wrappers.readthedocs.io/en/stable/
These require deploying those steps of the workflow on the fly

options:

write own frankenstein snakemake with essentials of current workflow
try to download a local version/location of each of the repos that are pulled from wrappers in the downloaded version

First exploration of Nib1 DEGs

Overview/goal: Inspecting the data FASTQ EA18.1 untreated and 24h

Process:

Concatenated both lanes for forward fastq files and reverse fastq files prior to running through kallisto.
- attempted to do so programmatically with bash script: /project/fudenber_735/collaborations/karissa_2022/RNAseq-mapped/kallistoquant-rnaseq-all.job
- idea/note: if snakemake is not the right tool, or even if it is, this full step could probably be accomplished with a combo of python scripting and bash scripting. Using the sample.csv file format in the specs for the snakemake/config.yml
- instead, did concatenation manually first then ran job: /project/fudenber_735/collaborations/karissa_2022/RNAseq-mapped/kallistoquant-all.job
In meeting, karissa was talking about cleanup steps prior to her workflow with STAR. It looks like it's probably unnecessary with modern aligners for our purpose: https://dnatech.genomecenter.ucdavis.edu/faqs/when-should-i-trim-my-illumina-reads-and-how-should-i-do-it/
- is generating .bigwigs the genome annotation? : "However, if the data are used for variant analyses, genome annotation or genome or transcriptome assembly purposes, we recommend read trimming, including both, adapter and quality trimming."
Mapped to transcriptome with kallisto, manually/like last time
- note: options to generate .bams if we supply .gtf file. I can try this (not sure if this is sufficient for Karissa/Elphege, or if they need full bigwigs)
DESeq2 for DEGs, using ~condition
Re-generated DEGs for dCTCF by using the transcripts mapped to mm10 instead, so that these two datasets are more directly comparable

3.22.22

Comparing Abundances of DE Genes in dNipBl

Overview/Goal: Digging a little into Karissa's RNAseq method, and comparing STAR results and abundances with kallisto. Also, doing some quick DE Analysis for 24 hour time mark.

Choosing corresponding downloads for genome annotations

Purpose: Finding the older releases that are most accurate (typically, most up-to-date/newest release that contains the genome for the species of interest) is not always

The releases on this main page have content

Notes:

All archived files from ensembl databases available via FTP:
This page lists the https://m.ensembl.org/info/website/archives/index.html
Moving forward, using:

genome genome assembly ensembl release release date GENCODE version

mm9 NCBI37 54 May 2009 N/A

mm10 GRCm38.p6 102 Nov 2020 M23

GRCm39 GRCm39 105 (Dec 2021) M27

genome	genome assembly	ensembl release	release date	GENCODE version
mm9	NCBI37	54	May 2009	N/A
mm10	GRCm38.p6	102	Nov 2020	M23
GRCm39	GRCm39	105	(Dec 2021)	M27

3.29.22

Major updates:

(put summary of git pushes)

enhancer liftover

using: https://github.com/agshumate/Liftoff
first, translated enhancer data from elphege using: BED-to-GFF on https://usegalaxy.org/