Direct Alignment - TealFurnholm/Strain-Level_Metatranscriptome_Analysis GitHub Wiki

Metatranscriptome Direct Alignment

This assumes that you did a single (vs paired) read sequencing. If not, do this read QC: https://github.com/TealFurnholm/Teals_Strain-Level_Metagenome_Pipeline/wiki/Assembly And then do this alignment: https://github.com/TealFurnholm/Teals_Strain-Level_Metagenome_Pipeline/wiki/Direct-Alignment-of-Reads

FORMAT SAMPLE NAMES AND LIST

  • Format your sequencing files so the names so that the [sample_name].fastq (or .fastq.gz if compressed). Examples:
    • Control_T0.fastq
    • Toxin-Day5.fastq.gz
  • LIST the [sample_name] in a file called "file_list.txt"
    • Example: Control_T0 Control_T5 Toxin-10mM_T0 Toxin-10mM_T5

GET THE HUMAN GENOME AND PERL SCRIPTS

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/GRCh38.p13.genome.fa.gz wget https://github.com/TealFurnholm/Metatranscriptome/blob/master/RemovePoly.pl wget https://github.com/TealFurnholm/Metatranscriptome/blob/master/Get_Info_Matrix.pl

DO READ QC AND ALIGNMENT

while read i;
      do echo -n "doing $i ";
      
      #trim reads
      ** adjust the trim parameters as needed
      comics TrimmomaticSE -threads 20 -summary ${i}_trim.log \
      ${i}.fastq.gz ${i}_T.fastq.gz \
      ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 MINLEN:50;
       
      #remove human, low entropy/repeats, and duplicates
      comics bbmap minid=0.90 ref=GRCh38.p13.genome.fa.gz statsfile=${i}_mapNH.log in=${i}_T.fastq.gz outu=${i}_TNH.fastq.gz;
      perl RemovePoly.pl ${i}_TNH.fastq.gz 100 ${i}_TNHNR.fastq.gz;
      comics dedupe in=${i}_TNHNR.fastq.gz out=${i}_TNHNRDD.fastq.gz;
      
      comics diamond blastx \
      -d /geomicro/data2/tealfurn/MUSCATO/URDB_PROT_REF.dmnd \
      -q ${i}_TNHNRDD.fastq.gz \
      -o ${i}_TNHNRDD_vs_URDB.m8 \
      --top 0.5 --threads 20 --strand both \
      -f 6 qseqid qlen sseqid slen qstart qend sstart send evalue pident mismatch qcovhsp scovhsp;
      
      perl Get_Info_Matrix.pl ${i}_TNHNRDD_vs_URDB.m8 ${i};
      
      #clean-up old files
      rm ${i}_TNH.fastq*;
      rm ${i}_TUNP.fastq*;
      rm ${i}_TNHNR.fastq*;
done < file_list.txt;
  • NOTE1: do not use these reads for read depth or expression/quantification. Use your trimmed or raw reads.