Circos Plot - ShadeNiece/DisculaDestructiva_GenomeAssembly-Annotation GitHub Wiki

Background

Circos plot generated for T2T genome assembly of Discula destructiva isolate AS111. Plot used in PLOS Pathogens submission 2025.
This is my documentation for how I created my Circos plot because Circos is quite extensive.

Pipeline

01. Set up Circos files and directories

Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot

Conda install Circos.

conda create -n circos -c bioconda circos

Make a circos directory then make the subdirectories.

mkdir circos_plot
cd circos_plot
mkdir data etc images

Determine what you want to plot.

I want to plot my chromosomes, gc content, gene density, and transposable element density, so I will need to make these files: karyotype.txt, gc_content.txt, gene_density.txt, te_density.txt, respectively. It appears that you can pretty much add whatever you want, so feel free to modify if you need more or less.

touch karyotype.txt gc_content.txt gene_density.txt te_density.txt

02. Make karyotype file

Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot/data
To plot your chromosomes, you will need to have the chromosome coordinates (start and end bp) in a tab-separated format within the karyotype.txt file.

nano karyotype.txt

chr - scaffold_1 scaffold_1 0 4387558 Chr1 grey
chr - scaffold_2 scaffold_2 4456448 9758616 Chr2 grey
chr - scaffold_3 scaffold_3 9961472 18850817 Chr3 grey
chr - scaffold_4 scaffold_4 18874368 26432625 Chr4 grey
chr - scaffold_5 scaffold_5 26476544 32417525 Chr5 grey
chr - scaffold_6 scaffold_6 32505856 38103130 Chr6 grey
chr - scaffold_7 scaffold_7 38273024 42691509 Chr7 grey
chr - scaffold_8 scaffold_8 42729472 47290015 Chr8 grey

If you have an indexed genome, you should be able to get the coordinates that way. I got my chromosome coordinates from my indexed genome that was used during STAR mapping for genome annotation.
Format: chr - <id> <label> <start> <end> <display_label> <color>

03. Make gc content file

Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot/data
To plot gc content, you will need to have the gc content ratios in a tab-separated format within the gc_content.txt file. I'll be using bedtools.

Install bedtools.

conda create -n bedtools -c bioconda bedtools

Copy the genome fasta to current directory.

cp ../../15_final_genome_files/dd_as111_100x_assembly.hic.p_ctg_mt-filtered.post-proc.FINAL.tgsgapcloser.scaff_seqs.fasta

Make genome.txt file to use as input for bedtools windows.

nano genome.txt

Chr1    4387559
Chr2    5302169
Chr3    8889346
Chr4    7558258
Chr5    5940982
Chr6    5597275
Chr7    4418486
Chr8    4560544

This is just the length of each scaffold/chromosome. If you have an indexed genome, you should be able to get the length that way. I got my chromosome lengths from my indexed genome that was used during STAR mapping for genome annotation: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_Annotation/03_STAR/dd_genome_index/chrLength.txt.

Run bedtools (v2.31.1).

nano 01_run_bedtools_gc_content.qsh

#!/bin/bash
#SBATCH --job-name=bedtools
#SBATCH --nodes=1
#SBATCH --ntasks=30
#SBATCH --mem=100G
#SBATCH -A ACF-UTK0032
#SBATCH --partition=short
#SBATCH --qos=short
#SBATCH --time=03:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]

eval "$(conda shell.bash hook)"
conda activate bedtools

# Step 1: Create genome windows (10 kb) based on genome.txt
echo "Generating genome windows..."
bedtools makewindows -g genome.txt -w 10000 > genome_windows.bed

# Step 2: Calculate GC content for each window
echo "Calculating GC content..."
bedtools nuc -fi dd_as111_100x_assembly.hic.p_ctg_mt-filtered.post-proc.FINAL.tgsgapcloser.scaff_seqs.fasta -bed genome_windows.bed > gc_content_windows.txt

# Step 3: Extract only chromosome, start, end, and GC content (column 10)
echo "Extracting relevant GC content..."
awk '{if (NR>1) print $1, $2, $3, $5}' gc_content_windows.txt > gc_content.txt

echo "GC content calculation complete."

sbatch 01_run_bedtools_gc_content.qsh

This should produce a gc_content.txt file that has the following format <scaffold_name> <start_window_position> <end_window_position> <gc_content>:

scaffold_1 0 10000 0.349500
scaffold_1 10000 20000 0.408400
scaffold_1 20000 30000 0.353400
scaffold_1 30000 40000 0.383700
etc............................

04. Make gene density file

Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot/data
To plot gene density, you will need to have the gff3 file from genome annotation to create a tab-separated gene_density.txt file. I'll be using bedtools.

Copy the gff3 (from BRAKER) to current directory.

cp ../../../../DisculaDestructiva_Annotation/04_BRAKER/braker_outputs/dd.gff3 .

Run bedtools.

nano 02_run_bedtools_gene_density.qsh

#!/bin/bash
#SBATCH --job-name=bedtools
#SBATCH --nodes=1
#SBATCH --ntasks=30
#SBATCH --mem=100G
#SBATCH -A ACF-UTK0032
#SBATCH --partition=short
#SBATCH --qos=short
#SBATCH --time=03:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]

eval "$(conda shell.bash hook)"
conda activate bedtools

# Step 1: Create genome windows (10 kb) based on genome.txt
echo "Generating genome windows..."
bedtools makewindows -g genome.txt -w 10000 > genome_windows.bed

# Step 2: Create BED file for gene coordinates
echo "Creating BED file for gene coordinates..."
awk '$3 == "gene" {print $1"\t"$4"\t"$5}' dd.gff3 > genes_coordinates.bed

# Step 3: Calculate number of genes in each window
echo "Counting genes in each window..."
bedtools intersect -a genome_windows.bed -b genes_coordinates.bed -c > gene_density.bed

# Step 4: Get final gene density values (count/window size of 10kb)
echo "Getting final gene density values..."
awk '{print $1, $2, $3, $4, $4/10000}' gene_density.bed > gene_density.txt

sbatch 02_run_bedtools_gene_density.qsh

This should produce a gene_density.txt file that has the following format <scaffold_name> <start_window_position> <end_window_position> <gene_count> <gene_density>:

scaffold_1 0 10000 1 0.0001
scaffold_1 10000 20000 0 0
scaffold_1 20000 30000 0 0
scaffold_1 30000 40000 0 0
scaffold_1 40000 50000 0 0
scaffold_1 50000 60000 1 0.0001
scaffold_1 60000 70000 0 0
scaffold_1 70000 80000 1 0.0001
scaffold_1 80000 90000 1 0.0001
scaffold_1 90000 100000 2 0.0002
etc............................

05. Make transposable element density file

Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot/data
To plot TE density, you will need to have the output gff file from RepeatMasker and feed that into bedtools to create a tab-separated te_density.txt file. I'll be using bedtools.

Copy the gff file (from RepeatMasker) to current directory.

cp ../../../../DisculaDestructiva_Annotation/02_RepeatMasker-in_sphinx/dd_as111_100x_final_nu_final_mt_combined.fasta.out.gff .

Run bedtools.

nano 03_run_bedtools_te_density.qsh

#!/bin/bash
#SBATCH --job-name=bedtools
#SBATCH --nodes=1
#SBATCH --ntasks=30
#SBATCH --mem=100G
#SBATCH -A ACF-UTK0032
#SBATCH --partition=short
#SBATCH --qos=short
#SBATCH --time=03:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]

eval "$(conda shell.bash hook)"
conda activate bedtools

# Step 1: Create genome windows (10 kb) based on genome.txt
echo "Generating genome windows..."
bedtools makewindows -g genome.txt -w 10000 > genome_windows.bed

# Step 2: Create BED file for transposable elements (TEs)
echo "Creating BED file for transposable elements..."
awk '$3 == "dispersed_repeat" {print $1"\t"$4"\t"$5}' dd_as111_100x_final_nu_final_mt_combined.fasta.out.gff > te_coordinates.bed

# Step 3: Calculate TE density in each window
echo "Counting TEs in each window..."
bedtools intersect -a genome_windows.bed -b te_coordinates.bed -c > te_density.bed

# Step 4: Get final TE density values (count/window size of 10kb)
echo "Getting final TE density values..."
awk '{print $1, $2, $3, $4, $4/10000}' te_density.bed > te_density.txt

echo "TE calculations complete!"

sbatch 03_run_bedtools_te_density.qsh

This should produce a te_density.txt file that has the following format <scaffold_name> <start_window_position> <end_window_position> <te_count> <te_density>:

scaffold_1 0 10000 12 0.0012
scaffold_1 10000 20000 9 0.0009
scaffold_1 20000 30000 6 0.0006
scaffold_1 30000 40000 3 0.0003
scaffold_1 40000 50000 10 0.001
scaffold_1 50000 60000 9 0.0009
scaffold_1 60000 70000 15 0.0015
etc............................

Circos Plot - ShadeNiece/DisculaDestructiva_GenomeAssembly-Annotation GitHub Wiki

Background

Pipeline

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️