03_ASSEMBLIES - eolesin/AMOR_Indiv_Assembly_Protocol GitHub Wiki

Each sample will be assembled separately. This decision was made after much deliberation. The inability to coassemble large metagenome datasets is a burden to sample comparison downstream in some cases, but hopefully this can be overcome at the various levels of information we are seeking.

We employ MEGAHIT for the assembly. I have elected to process each sample separately in a loop, but this could perhaps be better optimized for instance if I were to do this on SAGA, by cutting it into several parallel jobs instead of one large job as I do here.

# On kjempefuru:
AMOR_2019_path='/export/dahlefs/work/Shotgun/Metagenomes_chimneys_2019/01_QC'
AMOR_2020_path='/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/02_HUMAN_Decontam/'

# 2019 data first
while read line; 
    do 
        mypath=$(echo $AMOR_2019_path);
        Dataset=$(echo $line);
        R1_suff="-QUALITY_PASSED_R1.fastq"; 
        R2_suff="-QUALITY_PASSED_R2.fastq"; 
        megahit -1 $mypath$line$R1_suff -2 $mypath$line$R2_suff `\
        --min-contig-len 1000 -m 0.85 -o 03_INDIV_ASSEMBLY/$Dataset -t 40; 
    done < AMOR_2019

# Then all the 2020 samples we deemed "good"
while read line; 
    do 
        mypath=$(echo $AMOR_2020_path); 
        Dataset=$(echo $line);
        R1_suff="-cleanR1.fq"; 
        R2_suff="-cleanR2.fq"; 
        megahit -1 $mypath$line$R1_suff -2 $mypath$line$R2_suff \
        --min-contig-len 1000 -m 0.85 -o 03_INDIV_ASSEMBLY/$Dataset -t 40; 
    done < AMOR_2020_Good

# Then all the iron mat samples
while read line; 
    do 
        mypath=$(echo $AMOR_2020_path); 
        Dataset=$(echo $line);
        R1_suff="-cleanR1.fq"; 
        R2_suff="-cleanR2.fq"; 
        megahit -1 $mypath$line$R1_suff -2 $mypath$line$R2_suff \
        --min-contig-len 1000 -m 0.85 -o 03_INDIV_ASSEMBLY/$Dataset -t 40; 
    done < Iron_mats