Module 2: Lab 2 - Lavadav/EPP531_AGA GitHub Wiki

Hifiasm Assembly

Sassafras Chromosome Number: 2n=48

1. Convert Hifiasm output to FASTA file

# convert gfa to fasta - primary haplotype
awk '/^S/{print ">"$2;print $3}' \
Sassafras_V1.0_with_Hi-C_30X.hic.hap1.p_ctg.gfa\
> Sassafras_V1.0_with_Hi-C_30X.hic.hap1.p_ctg.fasta

# convert gfa to fasta - alternative haplotype
awk '/^S/{print ">"$2;print $3}' \
Sassafras_V1.0_with_Hi-C_30X.hic.hap2.p_ctg.gfa\
> Sassafras_V1.0_with_Hi-C_30X.hic.hap2.p_ctg.fasta

2. Generate stats for the Hifiasm assembly

/sphinx_local/software/bbmap/stats.sh -Xmx10g \
in=Sassafras_V1.0_with_Hi-C_30X.hic.hap1.p_ctg.fasta \
> Sassafras_V1.0_with_Hi-C_30X.hic.hap1.p_ctg.stats.txt

/sphinx_local/software/bbmap/stats.sh -Xmx10g \
in=Sassafras_V1.0_with_Hi-C_30X.hic.hap2.p_ctg.fasta \
> Sassafras_V1.0_with_Hi-C_30X.hic.hap2.p_ctg.stats.txt

* Notes on Bbmap stats

  1. N/L50: The length of the shortest contig for which longer and equal length contigs cover at least 50 % of the assembly (L50 gives number of contigs).
  2. N/L90: The length of the shortest contig for which longer and equal length contigs cover at least 90 % of the assembly (L90 gives number of contigs).

* Hifiasm Initial Stats

3. Hifi and Hi-C Data

Hi-C Data

ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/Hi-C/results/salbidum01_1334140/Hi-C/salbidum01_1334141_S3HiC_R1.fastq.gz .
ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/Hi-C/results/salbidum01_1334140/Hi-C/salbidum01_1334141_S3HiC_R2.fastq.gz .

103x Hifi Data

ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/sassafras_samtools_HiFI_reads.fq .

90x Hifi Data

ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/Sassafras_DS_90x.fastq .

80x Hifi Data

ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/Sassafras_DS_80x.fastq .

60x Hifi Data

ln -s /pickett_sphinx/projects/EPP531_AGA/lyadav_EPPAGA/sassafras/raw_data/PacBioHiFi/Sassafras_DS_60x.fastq .

4. Troubleshooting with (-s) and (--hg-size) parameters

/sphinx_local/software/hifiasm/hifiasm \
-o Sassafras_V1.0_with_Hi-C \
-t 2 \
-s (0.40 - 0.55) \
--hg-size (1600 - 1700m) \
--h1 salbidum01_1334141_S3HiC_R1.fastq.gz \
--h2 salbidum01_1334141_S3HiC_R2.fastq.gz \
sassafras_samtools_HiFI_reads.fq \
>& hifiasm_output.txt

5. Assignment for next week

Next Step is removal of Chloroplast and Mitochondria sequences. Sassafras has chloroplast genome but no mitochondria genome.

Find the Mitochondrial genome for the closest relative of Sassafras albidum?