Module 2: Lab 1 - Lavadav/EPP531_AGA GitHub Wiki

Symbolically link data to current directore

HiFi Data

ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/m84109_240206_204137_s2.hifi_reads.bc2017.bam .
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/m84109_240206_204137_s2.hifi_reads.bc2017.bam.pbi .
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/gbru.m84109_240206_204137_s2.hifi_reads.bc2017.bam.md5 .

Hi-C Data

ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/Hi-C/results/salbidum01_1334140/Hi-C/salbidum01_1334141_S3HiC_R1.fastq.gz .
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/Hi-C/results/salbidum01_1334140/Hi-C/salbidum01_1334141_S3HiC_R2.fastq.gz .

Sassafras Genome Assembly Pipeline

Step 1: QC of Hifi Data

Put the following in your bash script:

export PATH=$PATH:/pickett_shared/software/apptainer_unprivileged/bin/

Apply the change:

source ~/.bashrc

Test Longqc

apptainer exec -B $PWD /sphinx_local/images/longqc_latest.sif* longQC.py --version

Run LongQC

apptainer exec -B $PWD -B /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/ /sphinx_local/images/longqc_latest.sif* longQC.py sampleqc -x pb-hifi -p 4 -o longqc_out/ m84109_240206_204137_s2.hifi_reads.bc2017.bam

Step 2: Convert BAM to Fastq

bedtools bamtofastq -i m84109_240206_204137_s2.hifi_reads.bc2017.bam -fq sassafras_bedtools_HiFI_reads.fq

or

samtools bam2fq m84109_240206_204137_s2.hifi_reads.bc2017.bam > sassafras_samtools_HiFI_reads.fq

HiFi Fastq Data

ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/sassafras_samtools_HiFI_reads.fq .

Step 3: QC of Hi-C Data

Load FastQC

spack load fastqc

Run FastQC on Hi-C Files

fastqc *fastq.gz -o Hi-C_fastqc

Step 4: Downsample HiFi Reads

spack load seqtk
seqtk sample -s100 <input.fq> <no. of sequences> > <new_name.fq>

Step 5: Genome Assembly with Hifi Data (No Hi-C)

/sphinx_local/software/hifiasm/hifiasm \
-o Sassafras_V1.0_no_Hi-C \
-t 3 \
--hg-size 800m \
sassafras_samtools_HiFI_reads.fq

Step 6: Genome Assembly with Hifi + Hi-C Data

/sphinx_local/software/hifiasm/hifiasm \
-o Sassafras_V1.0_with_Hi-C \
-t 3 \
--hg-size 800m \
--h1 salbidum01_1334141_S3HiC_R1.fastq.gz \
--h2 salbidum01_1334141_S3HiC_R2.fastq.gz \
sassafras_samtools_HiFI_reads.fq

Install Conda

First, go to this page and download the Miniconda bash script in your home directory -

wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh

NOTE - Please double check that you have the latest link available.

Then, run the script -

bash Miniconda3-py39_4.12.0-Linux-x86_64.sh

Conda will be installed. You will have to log out and then log back in. Then, in order to correctly use Bioconda, run these commands (you have to run these just once) -

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

You are all set.

⚠️ **GitHub.com Fallback** ⚠️