repeats masking and density ideogram - statonlab/blueberry-berry-development GitHub Wiki

Haplotype genome assembly is masked by repeatModeler and repeatMasker following this instruction: https://github.com/mestato/statonlabprivate/wiki/V.darrowii-genome-annotation

Haplotype 1 result is stored in /staton/projects/blueberry_fruit_development/annotation_20May/1_repeatMask.
Haplotype 2 result is stored in /staton/projects/blueberry_fruit_development/annotation_20July/repeatMask

It outputs a masked genome fasta file, an .out file, .gff version 2 file and a .tbl.

Next we use the gff file to plot a repeat density map.

Repeat Density

The perl program DensityMap can take a gff3 file to plot the density as ideogram. It doesn't need to install, just git clone the repo and the perl script is ready to use.

cd /staton/software
git clone https://github.com/sguizard/DensityMap.git

It requires gff version3, but the repeatmasker produced gff2. We used the rmOutToGFF3.pl to convert the repeatmasker .out file into gff3. The rmOutToGFF3 has to be in the same directory of the dependencies '.pm' files in this repo: https://github.com/rmhubley/RepeatMasker.

cd /staton/projects/blueberry_fruit_development/
git clone https://github.com/rmhubley/RepeatMasker.git
cd RepeatMasker
mv util/rmOutToGFF3.pl ..

Convert the out file into gff3

perl /staton/projects/blueberry_fruit_development/RepeatMasker/rmOutToGFF3.pl Vdarrowii_genome.v1.1.fasta.out > Vdarrowii_genome.v1.1.fasta.out.gff3

Take the main 12 chromosomes

grep 'chr1\|chr2\|chr3\|chr4\|chr5\|chr6\|chr7\|chr8\|chr9' Vdarrowii_genome.v1.1.fasta.out.gff3 > Vdarrowii_chr.repeats.gff3

Then use the DensityMap.pl to plot a svg file. The dependency GD:SVG module has to be installed in the conda environment.

conda install -c bioconda perl-gd-svg
perl /staton/software/DensityMap/DensityMap.pl -i Vdarrowii_chr.repeats.gff3 -ty 'dispersed_repeat=fused' -o repeats_hap1.svg -ba white -sc 20000 -sh 100
# it will detect the optimal plot size and ask if you agree it. An svg plot is generated.