Lab: Counting Reads - statonlab/EPP_575_RNASeq_Workshop GitHub Wiki
Counting Reads
Today we'll use the program HTSeq to count the number of reads mapped to individual genes in our reference genome.
Helpful links:
The general inputs for HTSeq are as follows:
- --format= specify alignment file type (e.g. "bam")
- --order= tell htseq the manner in which the alignment file has been sorted (e.g. "pos")
- --stranded= count reads dependent on forward/reverse strand-specific orientation (defaut: "no")
- --type= the gff3/gtf feature to count by (e.g. "gene")
- --idattr= the gff3/gtf column feature used to group the desired type
Note:
HTSeq allows for highly customizable count approaches for many data types. Depending on the library, you may need to adjust these arguments. If in doubt, read the docs.
Steps:
1. Load HTSeq using spack
spack load [email protected]%[email protected]
Make sure htseq-count is functional
htseq-count --help
2. Prepare a counts directory
Change to your analysis directory and run the following:
# go to your personal analysis directory
mkdir 4_featureCount
cd 4_featureCount
3. Run HTseq
htseq-count \
--format=bam \
--order=pos \
--stranded=no \
--type=gene \
--idattr=ID \
../3_Read_mapping/SRR17062759.Aligned.sortedByCoord.out.bam \
/pickett_shared/teaching/EPP575_Jan2022/reference_genome/Athaliana_447_Araport11.gene_exons.gff3 \
> SRR17062759.counts.txt \
2> SRR17062759.out