Module 2: Lab 2 - Lavadav/EPP531_AGA GitHub Wiki
Hifiasm Assembly
Sassafras Chromosome Number: 2n=48
1. Convert Hifiasm output to FASTA file
# convert gfa to fasta - primary haplotype
awk '/^S/{print ">"$2;print $3}' \
Sassafras_V1.0_with_Hi-C_30X.hic.hap1.p_ctg.gfa\
> Sassafras_V1.0_with_Hi-C_30X.hic.hap1.p_ctg.fasta
# convert gfa to fasta - alternative haplotype
awk '/^S/{print ">"$2;print $3}' \
Sassafras_V1.0_with_Hi-C_30X.hic.hap2.p_ctg.gfa\
> Sassafras_V1.0_with_Hi-C_30X.hic.hap2.p_ctg.fasta
2. Generate stats for the Hifiasm assembly
/sphinx_local/software/bbmap/stats.sh -Xmx10g \
in=Sassafras_V1.0_with_Hi-C_30X.hic.hap1.p_ctg.fasta \
> Sassafras_V1.0_with_Hi-C_30X.hic.hap1.p_ctg.stats.txt
/sphinx_local/software/bbmap/stats.sh -Xmx10g \
in=Sassafras_V1.0_with_Hi-C_30X.hic.hap2.p_ctg.fasta \
> Sassafras_V1.0_with_Hi-C_30X.hic.hap2.p_ctg.stats.txt
* Notes on Bbmap stats
- N/L50: The length of the shortest contig for which longer and equal length contigs cover at least 50 % of the assembly (L50 gives number of contigs).
- N/L90: The length of the shortest contig for which longer and equal length contigs cover at least 90 % of the assembly (L90 gives number of contigs).
* Hifiasm Initial Stats
3. Hifi and Hi-C Data
Hi-C Data
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/Hi-C/results/salbidum01_1334140/Hi-C/salbidum01_1334141_S3HiC_R1.fastq.gz .
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/Hi-C/results/salbidum01_1334140/Hi-C/salbidum01_1334141_S3HiC_R2.fastq.gz .
103x Hifi Data
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/sassafras_samtools_HiFI_reads.fq .
90x Hifi Data
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/Sassafras_DS_90x.fastq .
80x Hifi Data
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/Sassafras_DS_80x.fastq .
60x Hifi Data
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/Sassafras_DS_60x.fastq .
4. Troubleshooting with (-s) and (--hg-size) parameters
/sphinx_local/software/hifiasm/hifiasm \
-o Sassafras_V1.0_with_Hi-C \
-t 2 \
-s (0.40 - 0.55) \
--hg-size (1600 - 1700m) \
--h1 salbidum01_1334141_S3HiC_R1.fastq.gz \
--h2 salbidum01_1334141_S3HiC_R2.fastq.gz \
sassafras_samtools_HiFI_reads.fq \
>& hifiasm_output.txt
5. Assignment for next week
Next Step is removal of Chloroplast and Mitochondria sequences. Sassafras has chloroplast genome but no mitochondria genome.