MCScanx - DR-genomics/Genomics-pipelines GitHub Wiki

# Inter-chromosomal synteny through MCScanX:

[Refer: http://crop.ahau.edu.cn/__local/D/8C/29/D8DC9C2A5658EE4DAAAB8612237_4091ADEC_7756.pdf?e=.pdf]

  1. Blastp - Mvim protein sequences (self)
  2. For MCScanX to work right, filter the blast hits around 5 hits per query sequence. For some reason, -max_target_seqs option with blastp didn't work right for me, as I got output with >25 hits per query sequence with this option on. Hence, filtered the hits (via awk) to use as input for MCScanx.
awk '{if($3>=50) print}' Mvim/Mvim.blastp > Mvim/Mvim1.blastp #Filtered blast hits >=50% ID

awk 'NR==FNR{a[$1]++; next} a[$1]<10' Mvim1.blastp Mvim1.blastp > Mvim2.blastp #go over the file twice, first to build a reference and the second to filter as needed => to only keep with less than 10 hits per query sequence

Input files for MCScanx: 1. Blast file 2. gff file with four columns including Chr name specified by two letters followed by chr number, gene name, start and end positions. Make sure both files are tab-separated. [If not sure, use following one-liner: <Mvim2.blastp tr ' ' '\t' > Mvim2.tab.blastp]

Remove any redundant genes in gff file. awk 'a[$2]++' Mvim1.gff > Mvim2.gff

cd to MCScanx-master folder (myco server) and run

./MCScanx Mvim2

Generates Mvim2.collinearity and Mvim2.html directory with html files

./duplicate_gene_classifier Mvim2

Output Reading BLAST file and pre-processing Generating BLAST list 23861 matches imported (37604 discarded) 435 pairwise comparisons 193 alignments generated Type of dup Code Number Singleton 0 15417 Dispersed 1 10910 Proximal 2 1062 Tandem 3 913 WGD or segmental 4 11302

############################################################################################################3 Plot inter chromosome synteny blocks in circle plot

cd downstream_analyses
java circle_plotter -g Mvim2.gff -s Mvim2.collinearity -c circle.ctl -o Mvim2.png 

Format circle.ctl as follows

800     //plot width and height (in pixels)
Mv1,Mv2,Mv3,Mv4,Mv5,Mv6,Mv7,Mv8,Mv9,Mv10,Mv11,Mv12,Mv13,Mv14,Mv15,Mv16,Mv17,Mv18,Mv19,Mv20,Mv21,Mv22,Mv23       //chromosomes in the circle

calculate the Ka & Ks value of each collinear gene pair shown in the MCScanX output (.collinearity file). Clustalw and Bio-perl are needed for executing this program

sed 's/-RA$//g' /data/dhanu/JS_assembly_files/RNA-seq_annotation/PO1735_Microstegium_vimineum.protein.fasta > ../Mvim.cds
perl add_ka_and_ks_to_collinearity.pl -i ../Mvim2.collinearity -d ../Mvim.cds -o Mvim2.collinearity.kaks