March 2022 - Bozhie/transcription-modeling GitHub Wiki
3.10.2022
Questions and explorations of RNAseq workflows
Snakemake
- I was able to get local version of https://github.com/snakemake-workflows/rna-seq-kallisto-sleuth/
- Able to create all conda envs locally to be used for future: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#using-and-combining-pre-exising-workflows
Conda deployment also works well for offline or air-gapped environments. Running snakemake --use-conda --conda-create-envs-only will only install the required conda environments without running the full workflow. Subsequent runs with --use-conda will make use of the local environments without requiring internet access.
**remaining issue: **
- Many of the steps use other snakemake wrappers: https://snakemake-wrappers.readthedocs.io/en/stable/
- These require deploying those steps of the workflow on the fly
options:
- write own frankenstein snakemake with essentials of current workflow
- try to download a local version/location of each of the repos that are pulled from wrappers in the downloaded version
First exploration of Nib1 DEGs
Overview/goal: Inspecting the data FASTQ EA18.1 untreated and 24h
Process:
- Concatenated both lanes for forward fastq files and reverse fastq files prior to running through kallisto.
- attempted to do so programmatically with bash script:
/project/fudenber_735/collaborations/karissa_2022/RNAseq-mapped/kallistoquant-rnaseq-all.job
- idea/note: if snakemake is not the right tool, or even if it is, this full step could probably be accomplished with a combo of python scripting and bash scripting. Using the sample.csv file format in the specs for the snakemake/config.yml
- instead, did concatenation manually first then ran job:
/project/fudenber_735/collaborations/karissa_2022/RNAseq-mapped/kallistoquant-all.job
- attempted to do so programmatically with bash script:
- In meeting, karissa was talking about cleanup steps prior to her workflow with STAR. It looks like it's probably unnecessary with modern aligners for our purpose: https://dnatech.genomecenter.ucdavis.edu/faqs/when-should-i-trim-my-illumina-reads-and-how-should-i-do-it/
- is generating .bigwigs the genome annotation? : "However, if the data are used for variant analyses, genome annotation or genome or transcriptome assembly purposes, we recommend read trimming, including both, adapter and quality trimming."
- Mapped to transcriptome with kallisto, manually/like last time
- note: options to generate .bams if we supply .gtf file. I can try this (not sure if this is sufficient for Karissa/Elphege, or if they need full bigwigs)
- DESeq2 for DEGs, using ~condition
- Re-generated DEGs for dCTCF by using the transcripts mapped to mm10 instead, so that these two datasets are more directly comparable
3.22.22
Comparing Abundances of DE Genes in dNipBl
Overview/Goal: Digging a little into Karissa's RNAseq method, and comparing STAR results and abundances with kallisto. Also, doing some quick DE Analysis for 24 hour time mark.
Choosing corresponding downloads for genome annotations
Purpose: Finding the older releases that are most accurate (typically, most up-to-date/newest release that contains the genome for the species of interest) is not always
Notes:
- All archived files from ensembl databases available via FTP:
- This page lists the https://m.ensembl.org/info/website/archives/index.html
- Moving forward, using:
genome genome assembly ensembl release release date GENCODE version mm9 NCBI37 54 May 2009 N/A mm10 GRCm38.p6 102 Nov 2020 M23 GRCm39 GRCm39 105 (Dec 2021) M27
3.29.22
Major updates:
(put summary of git pushes)
enhancer liftover
- using: https://github.com/agshumate/Liftoff
- first, translated enhancer data from elphege using: BED-to-GFF on https://usegalaxy.org/