Direct Alignment - TealFurnholm/Strain-Level_Metatranscriptome_Analysis GitHub Wiki
Metatranscriptome Direct Alignment
This assumes that you did a single (vs paired) read sequencing. If not, do this read QC: https://github.com/TealFurnholm/Teals_Strain-Level_Metagenome_Pipeline/wiki/Assembly And then do this alignment: https://github.com/TealFurnholm/Teals_Strain-Level_Metagenome_Pipeline/wiki/Direct-Alignment-of-Reads
FORMAT SAMPLE NAMES AND LIST
- Format your sequencing files so the names so that the [sample_name].fastq (or .fastq.gz if compressed). Examples:
- Control_T0.fastq
- Toxin-Day5.fastq.gz
- LIST the [sample_name] in a file called "file_list.txt"
- Example: Control_T0 Control_T5 Toxin-10mM_T0 Toxin-10mM_T5
GET THE HUMAN GENOME AND PERL SCRIPTS
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/GRCh38.p13.genome.fa.gz wget https://github.com/TealFurnholm/Metatranscriptome/blob/master/RemovePoly.pl wget https://github.com/TealFurnholm/Metatranscriptome/blob/master/Get_Info_Matrix.pl
DO READ QC AND ALIGNMENT
while read i; do echo -n "doing $i "; #trim reads ** adjust the trim parameters as needed comics TrimmomaticSE -threads 20 -summary ${i}_trim.log \ ${i}.fastq.gz ${i}_T.fastq.gz \ ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 MINLEN:50; #remove human, low entropy/repeats, and duplicates comics bbmap minid=0.90 ref=GRCh38.p13.genome.fa.gz statsfile=${i}_mapNH.log in=${i}_T.fastq.gz outu=${i}_TNH.fastq.gz; perl RemovePoly.pl ${i}_TNH.fastq.gz 100 ${i}_TNHNR.fastq.gz; comics dedupe in=${i}_TNHNR.fastq.gz out=${i}_TNHNRDD.fastq.gz; comics diamond blastx \ -d /geomicro/data2/tealfurn/MUSCATO/URDB_PROT_REF.dmnd \ -q ${i}_TNHNRDD.fastq.gz \ -o ${i}_TNHNRDD_vs_URDB.m8 \ --top 0.5 --threads 20 --strand both \ -f 6 qseqid qlen sseqid slen qstart qend sstart send evalue pident mismatch qcovhsp scovhsp; perl Get_Info_Matrix.pl ${i}_TNHNRDD_vs_URDB.m8 ${i}; #clean-up old files rm ${i}_TNH.fastq*; rm ${i}_TUNP.fastq*; rm ${i}_TNHNR.fastq*; done < file_list.txt;
- NOTE1: do not use these reads for read depth or expression/quantification. Use your trimmed or raw reads.