April 7th - sellwe/Genome-Analysis GitHub Wiki
DNA preprocessing:
FastQC for Illumina reads. Two Illumina DNA-files in /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/Illumina:
E745-1.L500_SZAXPI015146-56_1_clean.fq.gz and
E745-1.L500_SZAXPI015146-56_2_clean.fq.gz
Create soft links to my folder: /home/sebase/Genome-Analysis/data/raw_data/genomic/illumina
With:
[sebase@rackham3 illumina]$ ln -s /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/Illumina/E745-1.L500_SZAXPI015146-56_1_clean.fq.gz /home/sebase/Genome-Analysis/data/raw_data/genomic/illumina/E745-1_1.fq.gz
[sebase@rackham3 illumina]$ ln -s /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/Illumina/E745-1.L500_SZAXPI015146-56_2_clean.fq.gz /home/sebase/Genome-Analysis/data/raw_data/genomic/illumina/E745-1_2.fq.gz
Shortening their names to E735-1_x.fq.gz
I created the SBatch-script "run_fastqc_extra.sh" in /home/sebase/Genome-Analysis/code/01_preprocessing and ran it.
fastqc /home/sebase/Genome-Analysis/data/raw_data/genomic/illumina/E745-1_{1,2}.fq.gz
-o /home/sebase/Genome-Analysis/analyses/01_preprocessing/
which hopefully will end up in my analyses folder.
Errors:
These two files in code are wrong fastqc_sebase.9891627.out fastqc_sebase.9891651.out. FastQC is capitalized like this.
Last one was a success, gave me the files: E745-1_1_fastqc.html E745-1_1_fastqc.zip E745-1_2_fastqc.html E745-1_2_fastqc.zip
Downloaded the -html files with:
PS C:\Users\Sebas> scp [email protected]:/home/sebase/Genome-Analysis/analyses/01_preprocessing/*.html C:\Users\Sebas\Downloads
Genome Assembly
Created softlinks for the PacBio-reads for the standard assembly.
5 files locaded in /proj/../Pacbio:
m131023_233432_42174_c100519312550000001823081209281335_s1_X0.1.subreads.fastq.gz
m131023_233432_42174_c100519312550000001823081209281335_s1_X0.2.subreads.fastq.gz
m131023_233432_42174_c100519312550000001823081209281335_s1_X0.3.subreads.fastq.gz
m131024_200535_42174_c100563672550000001823084212221342_s1_p0.1.subreads.fastq.gz
m131024_200535_42174_c100563672550000001823084212221342_s1_p0.2.subreads.fastq.gz
m131024_200535_42174_c100563672550000001823084212221342_s1_p0.3.subreads.fastq.gz
/proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/PacBio
[sebase@rackham3 pacbio]$ ln -s /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/PacBio/m131023_233432_*.subreads.fastq.gz .
[sebase@rackham3 pacbio]$ ln -s /proj/uppmax2025-3-3/Genome_Analysis/1_Zhang_2017/genomics_data/PacBio/m131024_200535_*.subreads.fastq.gz .
I created the sbatch file and started the job
/home/sebase/Genome-Analysis/code/02_genome_assembly/pacbio [sebase@rackham3 pacbio]$ sbatch sbatch_pacbio_canu.sh
Worked on third try i think, i will let it run and see, and then push the file