April 7th - sellwe/Genome-Analysis GitHub Wiki

DNA preprocessing:

FastQC for Illumina reads. Two Illumina DNA-files in /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/Illumina:

E745-1.L500_SZAXPI015146-56_1_clean.fq.gz and

E745-1.L500_SZAXPI015146-56_2_clean.fq.gz

Create soft links to my folder: /home/sebase/Genome-Analysis/data/raw_data/genomic/illumina

With:

[sebase@rackham3 illumina]$ ln -s /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/Illumina/E745-1.L500_SZAXPI015146-56_1_clean.fq.gz /home/sebase/Genome-Analysis/data/raw_data/genomic/illumina/E745-1_1.fq.gz

[sebase@rackham3 illumina]$ ln -s /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/Illumina/E745-1.L500_SZAXPI015146-56_2_clean.fq.gz /home/sebase/Genome-Analysis/data/raw_data/genomic/illumina/E745-1_2.fq.gz

Shortening their names to E735-1_x.fq.gz

I created the SBatch-script "run_fastqc_extra.sh" in /home/sebase/Genome-Analysis/code/01_preprocessing and ran it.

fastqc /home/sebase/Genome-Analysis/data/raw_data/genomic/illumina/E745-1_{1,2}.fq.gz
-o /home/sebase/Genome-Analysis/analyses/01_preprocessing/

which hopefully will end up in my analyses folder.

Errors:

These two files in code are wrong fastqc_sebase.9891627.out fastqc_sebase.9891651.out. FastQC is capitalized like this.

Last one was a success, gave me the files: E745-1_1_fastqc.html E745-1_1_fastqc.zip E745-1_2_fastqc.html E745-1_2_fastqc.zip

Downloaded the -html files with:

PS C:\Users\Sebas> scp [email protected]:/home/sebase/Genome-Analysis/analyses/01_preprocessing/*.html C:\Users\Sebas\Downloads

Genome Assembly

Created softlinks for the PacBio-reads for the standard assembly.

5 files locaded in /proj/../Pacbio:

m131023_233432_42174_c100519312550000001823081209281335_s1_X0.1.subreads.fastq.gz

m131023_233432_42174_c100519312550000001823081209281335_s1_X0.2.subreads.fastq.gz

m131023_233432_42174_c100519312550000001823081209281335_s1_X0.3.subreads.fastq.gz

m131024_200535_42174_c100563672550000001823084212221342_s1_p0.1.subreads.fastq.gz

m131024_200535_42174_c100563672550000001823084212221342_s1_p0.2.subreads.fastq.gz

m131024_200535_42174_c100563672550000001823084212221342_s1_p0.3.subreads.fastq.gz

/proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/PacBio

[sebase@rackham3 pacbio]$ ln -s /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/PacBio/m131023_233432_*.subreads.fastq.gz .

[sebase@rackham3 pacbio]$ ln -s /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/PacBio/m131024_200535_*.subreads.fastq.gz .

I created the sbatch file and started the job

/home/sebase/Genome-Analysis/code/02_genome_assembly/pacbio [sebase@rackham3 pacbio]$ sbatch sbatch_pacbio_canu.sh

Worked on third try i think, i will let it run and see, and then push the file