Tutorial - harrlol/HAMRLNC GitHub Wiki

Demo Data source

For this tutorial, we'll be using the RNA-Seq data generated by Yu et al 2021. In this work, they report that Messenger RNA 5′ NAD+ capping is a dynamic regulatory epitranscriptome mark that is required for proper response to abscisic acid in Arabidopsis. A graphic abstract is shown below: tutorial_ref_article

Yu et al 2021 article graphic abstract

Pulling HAMRLNC Docker Image

To run HAMRLNC, you need to first pull the docker image for the pipeline to your computer. If you are not familiar with container technology and would like to learn the basics, please check out CyVerse Container & Cloud Native Camp Documentation. It is open source and free. Dig in!

Pull HAMRLNC docker image. This should take a few minutes depending on your internet speed.

docker pull chosenobih/hamrlnc:v0.01

After building the container, run the code below to be sure that you now have the image on your computer

docker image ls

Your output should be similar to the image below: docker_img_ls

clone HAMRLNC repo

git clone https://github.com/chosenobih/HAMRLINC.git
cd HAMRLINC

download the genome file for Arabidopsis thaliana from ENSEMBL

wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-57/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
gunzip Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz

download the annotation file for Arabidopsis thaliana from ENSEMBL

wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-57/gff3/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.57.gff3.gz
gunzip Arabidopsis_thaliana.TAIR10.57.gff3.gz

run HAMRLNC with SRA IDs with all three arms activated

docker run \
  --rm -v $(pwd):/working-dir \
  -w /working-dir chosenobih/hamrlnc:v0.01 \
  -o test_run \
  -c /demo/demo_filenames.csv \
  -g Arabidopsis_thaliana.TAIR10.dna.toplevel.fa \
  -i Arabidopsis_thaliana.TAIR10.59.gff3 \
  -l 50 -n 8 -k -p -u

Output Interpretation

All outputs of HAMRLNC are organized in corresponding subdirectories of the output directory. When run with all three core processing enabled, HAMRLINC produces ten subdirectories in the output directory. Three subdirectories contain key intermediates like genome index files, trimmed fastq files and bed files, which can be used in various downstream processing of the user’s choice. Three other subdirectories contain the raw output for each of the three core functionalities; one last subdirectory contains the visualizations and post-HAMR analysis results.

fig_1

Fig 1: Bar plots of the total abundance of HAMR predicted modifications by sample groups in CDS, exon, 5' UTR, gene, ncRNA, primary mRNA, and 3' UTR regions. (h) HAMR predicted modification abundance located in different RNA subtypes

fig_2

Fig 2: (a-g) Bar plots of the abundance of HAMR predicted modification classes by sample groups in CDS, exon, 5UTR, gene, ncRNA, primary mRNA, and 3 UTR regions. (h) Number of HAMR predicted modifications per gene region

fig_3

Fig 3: (a) Distribution of modification types in gene regions by sample groups. (b) Distribution of modification types in gene regions

GOheatmap_mod

Fig 4: GO term heatmap and predicted enrichment landscape