Module_2_Lab_1 - heelsplitter/Grootmyers_EPP_531_Applied_Genome_Analytics GitHub Wiki
Symbolically link data to current directory
HiFi Data
cd /pickett_sphinx/projects/EPP531_AGA/dgrootmy
mkdir 01_data
cd 01_data
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/m84109_240206_204137_s2.hifi_reads.bc2017.bam .
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/m84109_240206_204137_s2.hifi_reads.bc2017.bam.pbi .
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/gbru.m84109_240206_204137_s2.hifi_reads.bc2017.bam.md5 .
Hi-C Data
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/Hi-C/results/salbidum01_1334140/Hi-C/salbidum01_1334141_S3HiC_R1.fastq.gz .
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/Hi-C/results/salbidum01_1334140/Hi-C/salbidum01_1334141_S3HiC_R2.fastq.gz .
Check. Numbers have to match.
md5sum m84109_240206_204137_s2.hifi_reads.bc2017.bam.pbi
cat gbru.m84109_240206_204137_s2.hifi_reads.bc2017.bam.md5
Sassafras Genome Assembly Pipeline
Step 1: QC of Hifi Data
Put the following in your bash script:
nano ~/.bashrc
export PATH=$PATH:/pickett_shared/software/apptainer_unprivileged/bin/
Apply the change:
source ~/.bashrc
Test Longqc
apptainer exec -B $PWD /sphinx_local/images/longqc_latest.sif* longQC.py --version
Run LongQC
screen -S LongQc
screen -r LongQc
cd /pickett_sphinx/projects/EPP531_AGA/dgrootmy/01_data
source ~/.bashrc
spack load apptainer
spack load squashfuse
apptainer exec -B "$PWD,/pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi" /sphinx_local/images/longqc_latest.sif* longQC.py sampleqc -x pb-hifi -p 4 -o longqc_out/ m84109_240206_204137_s2.hifi_reads.bc2017.bam
Step 2: Convert BAM to Fastq
export SPACK_ROOT=/pickett_shared/spack
PATH=$PATH:$HOME/bin:$SPACK_ROOT/bin
. $SPACK_ROOT/share/spack/setup-env.sh
spack list bedtools
spack load bedtools2
bedtools bamtofastq -i m84109_240206_204137_s2.hifi_reads.bc2017.bam -fq sassafras_bedtools_HiFI_reads.fq
HiFi Fastq Data
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/sassafras_samtools_HiFI_reads.fq .
Step 3: QC of Hi-C Data
Load FastQC
spack load fastqc
Run FastQC on Hi-C Files
mkdir Hi-C_fastqc
fastqc *fastq.gz -o Hi-C_fastqc
FastQC R1 Result FastQC R2 Result
Step 4: Downsample HiFi Reads
cd /pickett_sphinx/projects/EPP531_AGA/dgrootmy/01_data
source ~/.bashrc
spack load seqtk
spack load /yfui77z
seqtk sample -s100 sassafras_samtools_HiFI_reads.fq 1058313 > sassafras_samtools_HiFI_reads_20x.fq
Step 5: Genome Assembly with Hifi Data (No Hi-C)
This run was not downsampled and was killed before completion.
screen -S NoHiC
screen -r NoHiC
cd /pickett_sphinx/projects/EPP531_AGA/dgrootmy/01_data
source ~/.bashrc
/sphinx_local/software/hifiasm/hifiasm \
-o Sassafras_V1.0_no_Hi-C \
-t 4 \
--hg-size 800m \
sassafras_samtools_HiFI_reads.fq
Downsampled to 20x:
screen -S NoHiC_down
/sphinx_local/software/hifiasm/hifiasm \
-o Sassafras_V1.0_no_Hi-C \
-t 4 \
--hg-size 800m \
sassafras_samtools_HiFI_reads_20x.fq
Step 6: Genome Assembly with Hifi + Hi-C Data
screen -S HiC_down
/sphinx_local/software/hifiasm/hifiasm \
-o Sassafras_V1.0_with_Hi-C \
-t 4 \
--hg-size 800m \
--h1 salbidum01_1334141_S3HiC_R1.fastq.gz \
--h2 salbidum01_1334141_S3HiC_R2.fastq.gz \
sassafras_samtools_HiFI_reads_20x.fq
Install Conda
First, go to this page and download the Miniconda bash script in your home directory -
cd /pickett_sphinx/projects/EPP531_AGA/dgrootmy
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh
NOTE - Please double check that you have the latest link available.
Then, run the script -
bash Miniconda3-py39_4.12.0-Linux-x86_64.sh
Conda will be installed. You will have to log out and then log back in. Then, in order to correctly use Bioconda, run these commands (you have to run these just once) -
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
You are all set.