Contig Alignment - TealFurnholm/Strain-Level_Metatranscriptome_Analysis GitHub Wiki
Warning, I STRONGLY recommend you analyze your metatranscriptome separately from the metagenome - the organisms and genes detected can be VERY different. You can of course align your metatranscriptome reads to the metagenome genes as part of the metagenome analysis, but make sure you also look separately at the metatranscriptome itself or you'll miss things! Why? Probably many factors (eg sample prep, sequencer) but also - as of 2021 - no assembler correctly assembles reads, they cut at branches (wrong) and remove bulges/bubbles (also wrong). But I discuss this in my metagenome github page.
!- Under Construction As Of 2-17-21 - mostly functional but final perl script needs updating
DO METAGENOME READS QC AND ASSEMBLY
Run your metagenome data with my pipeline at least through "Contig Analysis".
https://github.com/TealFurnholm/Teals_Strain-Level_Metagenome_Pipeline/wiki
GET THE HUMAN GENOME AND PERL SCRIPTS
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/GRCh38.p13.genome.fa.gz wget https://github.com/TealFurnholm/Metatranscriptome/blob/master/RemovePoly.pl wget https://github.com/TealFurnholm/Metatranscriptome/blob/master/Get_Info_Matrix.pl
DO METATRANSCRIPTOME READS QC AND ALIGNMENT
while read i; do echo -n "doing $i "; #trim reads ** adjust the trim parameters as needed comics TrimmomaticSE -threads 20 -summary ${i}_trim.log \ ${i}.fastq.gz ${i}_T.fastq.gz \ ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 MINLEN:50; #remove human, low entropy/repeats, and duplicates comics bbmap minid=0.90 ref=GRCh38.p13.genome.fa.gz statsfile=${i}_mapNH.log in=${i}_T.fastq.gz outu=${i}_TNH.fastq.gz; perl RemovePoly.pl ${i}_TNH.fastq.gz 100 ${i}_TNHNR.fastq.gz; comics dedupe in=${i}_TNHNR.fastq.gz out=${i}_TNHNRDD.fastq.gz; #align to contig genes comics bbmap ref=${i}_MERGED_CONTIGS_COR_GENES.fna ambig=all cigar=f idfilter=0.99 minid=0.99 idtag=t printunmappedcount=t \ in=${i}_TNHNRDD.fastq.gz out=${i}_TNHNRDD.fastq.gz;
#need to modify perl script to input contig gene info instead of URDB
#perl MetaT_vs_MetaG_ALN.pl ${i}_TNHNRDD_vs_URDB.m8 ${i}; #clean-up old files rm ${i}_TNH.fastq*; rm ${i}_TUNP.fastq*; rm ${i}_TNHNR.fastq*; done < file_list.txt;