Circos Plot - ShadeNiece/DisculaDestructiva_GenomeAssembly-Annotation GitHub Wiki

Background

  • Circos plot generated for T2T genome assembly of Discula destructiva isolate AS111. Plot used in PLOS Pathogens submission 2025.
  • This is my documentation for how I created my Circos plot because Circos is quite extensive.

Pipeline

01. Set up Circos files and directories

Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot

  1. Conda install Circos.
conda create -n circos -c bioconda circos
  1. Make a circos directory then make the subdirectories.
mkdir circos_plot
cd circos_plot
mkdir data etc images
  1. Determine what you want to plot.
  • I want to plot my chromosomes, gc content, gene density, and transposable element density, so I will need to make these files: karyotype.txt, gc_content.txt, gene_density.txt, te_density.txt, respectively. It appears that you can pretty much add whatever you want, so feel free to modify if you need more or less.
touch karyotype.txt gc_content.txt gene_density.txt te_density.txt
02. Make karyotype file

  • Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot/data
  • To plot your chromosomes, you will need to have the chromosome coordinates (start and end bp) in a tab-separated format within the karyotype.txt file.
nano karyotype.txt
chr - scaffold_1 scaffold_1 0 4387558 Chr1 grey
chr - scaffold_2 scaffold_2 4456448 9758616 Chr2 grey
chr - scaffold_3 scaffold_3 9961472 18850817 Chr3 grey
chr - scaffold_4 scaffold_4 18874368 26432625 Chr4 grey
chr - scaffold_5 scaffold_5 26476544 32417525 Chr5 grey
chr - scaffold_6 scaffold_6 32505856 38103130 Chr6 grey
chr - scaffold_7 scaffold_7 38273024 42691509 Chr7 grey
chr - scaffold_8 scaffold_8 42729472 47290015 Chr8 grey
  • If you have an indexed genome, you should be able to get the coordinates that way. I got my chromosome coordinates from my indexed genome that was used during STAR mapping for genome annotation.
  • Format: chr - <id> <label> <start> <end> <display_label> <color>
03. Make gc content file

  • Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot/data
  • To plot gc content, you will need to have the gc content ratios in a tab-separated format within the gc_content.txt file. I'll be using bedtools.
  1. Install bedtools.
conda create -n bedtools -c bioconda bedtools
  1. Copy the genome fasta to current directory.
cp ../../15_final_genome_files/dd_as111_100x_assembly.hic.p_ctg_mt-filtered.post-proc.FINAL.tgsgapcloser.scaff_seqs.fasta 
  1. Make genome.txt file to use as input for bedtools windows.
nano genome.txt
Chr1    4387559
Chr2    5302169
Chr3    8889346
Chr4    7558258
Chr5    5940982
Chr6    5597275
Chr7    4418486
Chr8    4560544
  • This is just the length of each scaffold/chromosome. If you have an indexed genome, you should be able to get the length that way. I got my chromosome lengths from my indexed genome that was used during STAR mapping for genome annotation: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_Annotation/03_STAR/dd_genome_index/chrLength.txt.
  1. Run bedtools (v2.31.1).
nano 01_run_bedtools_gc_content.qsh
#!/bin/bash
#SBATCH --job-name=bedtools
#SBATCH --nodes=1
#SBATCH --ntasks=30
#SBATCH --mem=100G
#SBATCH -A ACF-UTK0032
#SBATCH --partition=short
#SBATCH --qos=short
#SBATCH --time=03:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]

eval "$(conda shell.bash hook)"
conda activate bedtools

# Step 1: Create genome windows (10 kb) based on genome.txt
echo "Generating genome windows..."
bedtools makewindows -g genome.txt -w 10000 > genome_windows.bed

# Step 2: Calculate GC content for each window
echo "Calculating GC content..."
bedtools nuc -fi dd_as111_100x_assembly.hic.p_ctg_mt-filtered.post-proc.FINAL.tgsgapcloser.scaff_seqs.fasta -bed genome_windows.bed > gc_content_windows.txt

# Step 3: Extract only chromosome, start, end, and GC content (column 10)
echo "Extracting relevant GC content..."
awk '{if (NR>1) print $1, $2, $3, $5}' gc_content_windows.txt > gc_content.txt

echo "GC content calculation complete."
sbatch 01_run_bedtools_gc_content.qsh
  • This should produce a gc_content.txt file that has the following format <scaffold_name> <start_window_position> <end_window_position> <gc_content>:
scaffold_1 0 10000 0.349500
scaffold_1 10000 20000 0.408400
scaffold_1 20000 30000 0.353400
scaffold_1 30000 40000 0.383700
etc............................
04. Make gene density file

  • Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot/data
  • To plot gene density, you will need to have the gff3 file from genome annotation to create a tab-separated gene_density.txt file. I'll be using bedtools.
  1. Copy the gff3 (from BRAKER) to current directory.
cp ../../../../DisculaDestructiva_Annotation/04_BRAKER/braker_outputs/dd.gff3 .
  1. Run bedtools.
nano 02_run_bedtools_gene_density.qsh
#!/bin/bash
#SBATCH --job-name=bedtools
#SBATCH --nodes=1
#SBATCH --ntasks=30
#SBATCH --mem=100G
#SBATCH -A ACF-UTK0032
#SBATCH --partition=short
#SBATCH --qos=short
#SBATCH --time=03:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]

eval "$(conda shell.bash hook)"
conda activate bedtools

# Step 1: Create genome windows (10 kb) based on genome.txt
echo "Generating genome windows..."
bedtools makewindows -g genome.txt -w 10000 > genome_windows.bed

# Step 2: Create BED file for gene coordinates
echo "Creating BED file for gene coordinates..."
awk '$3 == "gene" {print $1"\t"$4"\t"$5}' dd.gff3 > genes_coordinates.bed

# Step 3: Calculate number of genes in each window
echo "Counting genes in each window..."
bedtools intersect -a genome_windows.bed -b genes_coordinates.bed -c > gene_density.bed

# Step 4: Get final gene density values (count/window size of 10kb)
echo "Getting final gene density values..."
awk '{print $1, $2, $3, $4, $4/10000}' gene_density.bed > gene_density.txt
sbatch 02_run_bedtools_gene_density.qsh
  • This should produce a gene_density.txt file that has the following format <scaffold_name> <start_window_position> <end_window_position> <gene_count> <gene_density>:
scaffold_1 0 10000 1 0.0001
scaffold_1 10000 20000 0 0
scaffold_1 20000 30000 0 0
scaffold_1 30000 40000 0 0
scaffold_1 40000 50000 0 0
scaffold_1 50000 60000 1 0.0001
scaffold_1 60000 70000 0 0
scaffold_1 70000 80000 1 0.0001
scaffold_1 80000 90000 1 0.0001
scaffold_1 90000 100000 2 0.0002
etc............................
05. Make transposable element density file

  • Working Directory: /lustre/isaac/proj/UTK0032/sniece/DisculaDestructiva_GenomeAssembly/analysis/circos_plot/data
  • To plot TE density, you will need to have the output gff file from RepeatMasker and feed that into bedtools to create a tab-separated te_density.txt file. I'll be using bedtools.
  1. Copy the gff file (from RepeatMasker) to current directory.
cp ../../../../DisculaDestructiva_Annotation/02_RepeatMasker-in_sphinx/dd_as111_100x_final_nu_final_mt_combined.fasta.out.gff .
  1. Run bedtools.
nano 03_run_bedtools_te_density.qsh
#!/bin/bash
#SBATCH --job-name=bedtools
#SBATCH --nodes=1
#SBATCH --ntasks=30
#SBATCH --mem=100G
#SBATCH -A ACF-UTK0032
#SBATCH --partition=short
#SBATCH --qos=short
#SBATCH --time=03:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]

eval "$(conda shell.bash hook)"
conda activate bedtools

# Step 1: Create genome windows (10 kb) based on genome.txt
echo "Generating genome windows..."
bedtools makewindows -g genome.txt -w 10000 > genome_windows.bed

# Step 2: Create BED file for transposable elements (TEs)
echo "Creating BED file for transposable elements..."
awk '$3 == "dispersed_repeat" {print $1"\t"$4"\t"$5}' dd_as111_100x_final_nu_final_mt_combined.fasta.out.gff > te_coordinates.bed

# Step 3: Calculate TE density in each window
echo "Counting TEs in each window..."
bedtools intersect -a genome_windows.bed -b te_coordinates.bed -c > te_density.bed

# Step 4: Get final TE density values (count/window size of 10kb)
echo "Getting final TE density values..."
awk '{print $1, $2, $3, $4, $4/10000}' te_density.bed > te_density.txt

echo "TE calculations complete!"
sbatch 03_run_bedtools_te_density.qsh
  • This should produce a te_density.txt file that has the following format <scaffold_name> <start_window_position> <end_window_position> <te_count> <te_density>:
scaffold_1 0 10000 12 0.0012
scaffold_1 10000 20000 9 0.0009
scaffold_1 20000 30000 6 0.0006
scaffold_1 30000 40000 3 0.0003
scaffold_1 40000 50000 10 0.001
scaffold_1 50000 60000 9 0.0009
scaffold_1 60000 70000 15 0.0015
etc............................
⚠️ **GitHub.com Fallback** ⚠️