Whole paranome and ortholog Ks distributions using ksrates tool - DR-genomics/Genomics-pipelines GitHub Wiki

Pipeline using the tool "ksrates" - To position WGD events based on speciation events using mixed paralog-ortholog analysis.

####Performed in myco server using docker

Mixed paralog-ortholog analysis on five species - M.vim(focal spp), S.bicolor, C. lacryma-jobi, Z. mays, O.sativa (outgroup) Required files: configuration file, protein coding sequences of each spp in fasta format and gene annotation file (gff) for the focal spp. Note: Keep all required files in the working directory. (Here, /path/to/ksrates/Mvim_analysis/) Generate configuration file

sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates generate-config

In the config file, define the species used in the analysis, their phylogenetic relationships, and fasta file names. Adjust the parameters given in the second section of the config file (if needed)

[SPECIES]
focal_species = Mvimineum
# informal name of the focal species from the input tree

newick_tree = ((((Mvimineum, Sbicolor), Clacryma-jobi), Zmays), Osativa);
# input phylogenetic tree in newick format; use the informal names

latin_names = Mvimineum: Microstegium vimineum, Sbicolor: Sorghum bicolor, Clacryma-jobi: Coix lacryma-jobi, Zmays: Zea mays, Osativa: Oryza sativa
# informal names associated to their latin name through a colon and separated by comma

fasta_filenames =  Mvimineum: Mvim.fasta, Sbicolor: Sbi.fasta, Clacryma-jobi: Coix-lacryma.fasta, Zmays: Zmays.fasta, Osativa: oryza.fasta
gff_filename = Mvim.gff3
# informal names associated to their filename/path through a colon and separated by comma

peak_database_path = ortholog_peak_db.tsv
ks_list_database_path = ortholog_ks_list_db.tsv
# filenames/paths of the ortholog data databases


[ANALYSIS SETTING]
paranome = yes
collinearity = yes
# analysis type for paralog data; allowed values: 'yes' or 'no'

gff_feature = mrna
# keyword to parse the sequence type from the gff file (column 3); can be 'gene', 'mrna'...

gff_attribute = id
# keyword to parse gene id from the gff file (column 9); can be 'id', 'name'...

max_number_outgroups = 4
# maximum number of outspecies/trios selected to correct each divergent species pair (default: 4)

consensus_mode_for_multiple_outgroups = mean among outgroups
# allowed values: 'mean among outgroups' or 'best outgroup' (default: 'mean among outgroups')
[PARAMETERS]
x_axis_max_limit_paralogs_plot = 2
# highest value of the x axis in the mixed distribution plot (default: 5)

bin_width_paralogs = 0.1
# bin width in paralog ks histograms (default: 0.1, ten bins per unit)

y_axis_max_limit_paralogs_plot = None
# highest value of the y axis in the mixed distribution plot  (default: none)

num_bootstrap_iterations = 200
# number of bootstrap iterations for ortholog peak estimate

divergence_colors =  Red, MediumBlue, Goldenrod, Crimson, ForestGreen, Gray, SaddleBrown, Black
# color of the divergence lines drawn in correspondence of the ortholog peaks
# use color names/codes separated by comma and use at least as many colors as the number of divergence nodes

x_axis_max_limit_orthologs_plots = 2
# highest value of the x axis in the ortholog distribution plots (default: 5)

bin_width_orthologs = 0.1
# bin width in ortholog ks histograms (default: 0.1, ten bins per unit)

max_ks_paralogs = 5
# maximum paralog ks value accepted from ks data table (default: 5)

max_ks_orthologs = 10
# maximum ortholog ks value accepted from ks data table (default: 10)

Run the initialization script to obtain the ortholog trios for the rate-adjustment (rate_adjustment/Mvimineum/ortholog_trios_Mvim.tsv) and to extract the species pairs to be run through the wgd ortholog KS analysis (rate_adjustment/Mvimineum/ortholog_pairs_Mvim.txt):

sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates init config_mvim.txt

Ortholog trios: Node Focal_Species Sister_Species Out_Species 1 Mvimineum Sbicolor Clacryma-jobi 1 Mvimineum Sbicolor Zmays 1 Mvimineum Sbicolor Osativa 2 Mvimineum Clacryma-jobi Zmays 2 Mvimineum Clacryma-jobi Osativa 3 Mvimineum Zmays Osativa

Ortholog pairs: Species1 Species2 Mvimineum Sbicolor Clacryma-jobi Mvimineum Clacryma-jobi Sbicolor Mvimineum Zmays Sbicolor Zmays Mvimineum Osativa Osativa Sbicolor Clacryma-jobi Zmays Clacryma-jobi Osativa Osativa Zmays

Wgd paralog Ks analysis: To estimate whole paranome Ks values (paralog_distributions/wgd_Mvimineum/Mvimineum.ks.tsv) and anchor pair Ks values (paralog_distributions/wgd_Mvimineum/Mvimineum.ks_anchors.tsv)

sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates paralogs-ks config_mvim.txt --n-threads 4

Wgd ortholog Ks analysis: to estimate ortholog Ks vales for each spp. pairs (path/to/ksrates/Mvim_analysis/rate_adjustment/Mvimineum/ortholog_pairs_Mvimineum.tsv)

sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Mvimineum Sbicolor --n-threads 4 &
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Mvimineum Clacryma-jobi --n-threads 4 &
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Mvimineum Osativa --n-threads 2
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Mvimineum Zmays --n-threads 2 &
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Clacryma-jobi Sbicolor --n-threads 2 &
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Clacryma-jobi Zmays --n-threads 4 &
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Clacryma-jobi Osativa --n-threads 4 &
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Sbicolor Zmays --n-threads 4 &
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Sbicolor Osativa --n-threads 4 &
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-ks config_mvim.txt Osativa Zmays --n-threads 4 &

Output files: path/to/ksrates/Mvim_analysis/ortholog_distributions/wgd_Mvimineum_Sbicolor: Files under this directory: Mvimineum_Sbicolor.blast.tsv Mvimineum_Sbicolor.ks.tsv Mvimineum_Sbicolor.orthologs.tsv.

Note: Similar files are generated for each spp. pair.

Estimate the mode and std. deviation for each ortholog ks distribution

sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-analysis config_mvim.txt

Output file: /path/to/ksrates/Mvim_analysis/ortholog_peak_db.tsv

Plot ortholog Ks distributions for each focal spp and its pair (and each of their trios)

sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates plot-orthologs config_mvim.txt

From documentation: The command generates a PDF file for each species pair with the three ortholog KS distributions obtained from each of the species trios the species pair is involved in. Output files: path/to/ksrates/Mvim_analysis/rate_adjustment/Mvimineum/*.pdf

Perform rate adjustment, plot adjusted mixed-paralog-ortholog Ks graph, plot tree, and paralogs-analyses

sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates orthologs-adjustment config_mvim.txt
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates plot-paralogs config_mvim.txt
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates plot-tree config_mvim.txt
sudo docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates paralogs-analyses config_mvim.txt

Output files: path/to/ksrates/Mvim_analysis/rate_adjustment/Mvimineum/adjustment_table_Mvimineum.tsv path/to/ksrates/Mvim_analysis/rate_adjustment/Mvimineum/mixed_Mvimineum_adjusted.pdf path/to/ksrates/Mvim_analysis/rate_adjustment/Mvimineum/tree_Mvimineum_distances.pdf path/to/ksrates/Mvim_analysis/rate_adjustment/Mvimineum/mixed_Mvimineum_anchor_clusters.pdf **=> Contains two WGD peaks = recent focal spp. specific and shared older peak

Notes: Refer documentation to perform analysis if the parameter collinearity=no is chosen in the config file