03_ASSEMBLIES - eolesin/AMOR_Indiv_Assembly_Protocol GitHub Wiki
Each sample will be assembled separately. This decision was made after much deliberation. The inability to coassemble large metagenome datasets is a burden to sample comparison downstream in some cases, but hopefully this can be overcome at the various levels of information we are seeking.
We employ MEGAHIT for the assembly. I have elected to process each sample separately in a loop, but this could perhaps be better optimized for instance if I were to do this on SAGA, by cutting it into several parallel jobs instead of one large job as I do here.
# On kjempefuru:
AMOR_2019_path='/export/dahlefs/work/Shotgun/Metagenomes_chimneys_2019/01_QC'
AMOR_2020_path='/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/02_HUMAN_Decontam/'
# 2019 data first
while read line;
do
mypath=$(echo $AMOR_2019_path);
Dataset=$(echo $line);
R1_suff="-QUALITY_PASSED_R1.fastq";
R2_suff="-QUALITY_PASSED_R2.fastq";
megahit -1 $mypath$line$R1_suff -2 $mypath$line$R2_suff `\
--min-contig-len 1000 -m 0.85 -o 03_INDIV_ASSEMBLY/$Dataset -t 40;
done < AMOR_2019
# Then all the 2020 samples we deemed "good"
while read line;
do
mypath=$(echo $AMOR_2020_path);
Dataset=$(echo $line);
R1_suff="-cleanR1.fq";
R2_suff="-cleanR2.fq";
megahit -1 $mypath$line$R1_suff -2 $mypath$line$R2_suff \
--min-contig-len 1000 -m 0.85 -o 03_INDIV_ASSEMBLY/$Dataset -t 40;
done < AMOR_2020_Good
# Then all the iron mat samples
while read line;
do
mypath=$(echo $AMOR_2020_path);
Dataset=$(echo $line);
R1_suff="-cleanR1.fq";
R2_suff="-cleanR2.fq";
megahit -1 $mypath$line$R1_suff -2 $mypath$line$R2_suff \
--min-contig-len 1000 -m 0.85 -o 03_INDIV_ASSEMBLY/$Dataset -t 40;
done < Iron_mats