Lab: Counting Reads - statonlab/EPP_575_RNASeq_Workshop GitHub Wiki

Counting Reads

Today we'll use the program HTSeq to count the number of reads mapped to individual genes in our reference genome.

Helpful links:

HTSeq documentation

The general inputs for HTSeq are as follows:

--format= specify alignment file type (e.g. "bam")
--order= tell htseq the manner in which the alignment file has been sorted (e.g. "pos")
--stranded= count reads dependent on forward/reverse strand-specific orientation (defaut: "no")
--type= the gff3/gtf feature to count by (e.g. "gene")
--idattr= the gff3/gtf column feature used to group the desired type

Note:

HTSeq allows for highly customizable count approaches for many data types. Depending on the library, you may need to adjust these arguments. If in doubt, read the docs.

Steps:

1. Load HTSeq using spack

spack load [email protected]%[email protected]

Make sure htseq-count is functional

htseq-count --help

2. Prepare a counts directory

Change to your analysis directory and run the following:

# go to your personal analysis directory

mkdir 4_featureCount
cd 4_featureCount

3. Run HTseq

htseq-count \
	--format=bam \
	--order=pos \
	--stranded=no \
	--type=gene \
	--idattr=ID \
	../3_Read_mapping/SRR17062759.Aligned.sortedByCoord.out.bam \
	/pickett_shared/teaching/EPP575_Jan2022/reference_genome/Athaliana_447_Araport11.gene_exons.gff3 \
	> SRR17062759.counts.txt \
	2> SRR17062759.out