Ecological Analysis TE distribution Mvim genome - DR-genomics/Genomics-pipelines GitHub Wiki

###No of TEs per chromosome

awk '{print $1}' JS_allchr23.fasta.repeatmasker.out.gff | sort -V | uniq -c

76820 Scaffold_1 74968 Scaffold_2 61781 Scaffold_3 82545 Scaffold_4 60729 Scaffold_5 85383 Scaffold_6 65567 Scaffold_7 67755 Scaffold_8 66804 Scaffold_9 55229 Scaffold_10 55343 Scaffold_11 54831 Scaffold_12 73672 Scaffold_13 82370 Scaffold_14 73810 Scaffold_15 68559 Scaffold_16 61090 Scaffold_17 60533 Scaffold_18 40266 Scaffold_19 42742 Scaffold_20 35541 Scaffold_21 39519 Scaffold_22 24172 Scaffold_23

###Make non-overalpping window for Mvim genome based on chr length. Intersect tool is used to find the count of TEs in each non-overlapping window. Average no. of TEs in each window for each chromosome. Sort by chr number. Output file name.

bedtools makewindows -g ../Mvim_chr_len.txt -w 90000 | bedtools intersect -a - -b ../Mvim_repeatmasker.bed -c | awk '{seen[$1]+=$4; count[$1]++} END{for (x in seen)print x, seen[x]/count[x]}' | sort -V > average_TEs_per90Kbwindow_perChr

###By trial and error, created custom non-overlapping window size per chromosome in such a way that there are 100 TEs on avg per window size per chromosome. ###USed above bedtools and awk one liner to create that/ ###First created individial files with chr_name and its length (2 col bed file). Then used make window feature to create respective window size for every chrom

bedtools makewindows -g chr1-len -w 90000 > windows/chr1-90kb.bed

(> genomics-core-2021a) [dramacha@trcis001 Mvim_assembly]$ bedtools makewindows -g chr2-len -w 90000 > windows/chr2-90kb.bed

(genomics-core-2021a) [dramacha@trcis001 Mvim_assembly]$ bedtools makewindows -g chr3-len -w 101000 > windows/chr3-101kb.bed (genomics-core-2021a) [dramacha@trcis001 Mvim_assembly]$ bedtools makewindows -g chr4-len -w 80000 > windows/chr4-80kb.bed (genomics-core-2021a) [dramacha@trcis001 Mvim_assembly]$ bedtools makewindows -g chr5-len -w 97000 > windows/chr5-97kb.bed (genomics-core-2021a) [dramacha@trcis001 Mvim_assembly]$ bedtools makewindows -g chr6-len -w 77000 > windows/chr6-77kb.bed (genomics-core-2021a) [dramacha@trcis001 Mvim_assembly]$ bedtools makewindows -g chr7-len -w 82000 > windows/chr7-82kb.bed (genomics-core-2021a) [dramacha@trcis001 Mvim_assembly]$ bedtools makewindows -g chr8-len -w 79000 > windows/chr8-79kb.bed .......

###concatenate above output into 1 file with all chr's with respective window sizes

cat windows/chr*bed | sort -V > windows/Mvim_custom_non-overlapping_windows.bed