Genome Assembly - Lavadav/EPP531_AGA GitHub Wiki
Data For Class
Cherokee Rose Subset Dataset
Step 1: Make the folders for data analysis
mkdir Raw_Data, Analysis, Results
Step 2: Copy the subset dataset into your folders.
cp /work/pbgg8900/instructor_data/Genome_Assembly_Data/Pacbio_Data/subset_SRR29286022.fastq .
Step 3: Soft link the dataset into your working directory.
ln -s path_to_raw_Data/ .
Step 4: Run hifiasm with and without Hi-C data.
Without Hi-C Data
ml hifiasm/0.25.0
hifiasm -o Hifiasm_output --hg-size 50m subset_SRR29286022.fastq
With Hi-C Data
ml hifiasm/0.25.0
hifiasm -o Hifiasm_output_Hi-C --hg-size 50m --h1 subset_HiC_R1.fastq.gz --h2 subset_HiC_R2.fastq.gz subset_SRR29286022.fastq
Step 5: Convert the .gfa file to FASTA file.
awk '/^S/{print ">"$2;print $3}' Hifiasm_output.bp.p_ctg.gfa > Hifiasm_output.bp.p_ctg.fasta
Step 6: Access the assembly statistics.
ml BBMap/39.19-GCC-13.3.0
stats.sh Hifiasm_output.bp.p_ctg.fasta > Hifiasm_output.bp.p_ctg.stats.txt
Homework: Repeat the above steps with Hi-C files being used in for Hifiasm assembly.