7 Conflict Analysis - PhyloAI/Ortho2Web GitHub Wiki

In our investigation of phylogenetic relationships, we will employ PhyParts and Quartet Sampling (QS) to conduct a comprehensive conflict analysis among gene trees. PhyParts allows us to evaluate the concordance and discordance between gene trees and a reference species tree, providing insights into the evolutionary processes shaping the observed phylogenetic patterns. Meanwhile, QS offers a framework for quantifying the support for different evolutionary hypotheses by assessing the relationships among gene trees and the species tree. Together, these tools will enable us to discern the underlying conflicts in our data, enhancing our understanding of the complexities inherent in the evolutionary history of the taxa under study.

7.1 PhyParts

Step 1: Installation and environment setup

conda install -c conda-forge maven
# Install Maven, which is required to build Phyparts

conda install -c conda-forge git
# Install Git to clone the repository
git clone https://bitbucket.org/blackrim/phyparts.git
# Clone the Phyparts repository from Bitbucket 

./mvn_cmdline.sh
# Run the Maven command-line setup script to build Phyparts

Step 2: Gene tree construction

for name in $(cat namelist.txt); do 
  raxmlHPC-PTHREADS-SSE3 -T 8 -f a -N 200 -p 12345 -x 12345 -s "$name" -n "$name" -m GTRGAMMA;
done
# Loop over each gene alignment file listed in namelist.txt to construct gene trees using RAxML
# -T: Number of threads.
# -f: Algorithm ('a' for rapid bootstrap analysis followed by best-scoring maximum likelihood tree search).
# -N: Number of bootstrap replicates.
# -p: Random seed for parsimony inferences.
# -x: Random seed for rapid bootstrap analysis.
# -s: Input sequence alignment file.
# -n: Output file suffix for the resulting tree.
# -m: Substitution model.
for file in *.tre; do 
  nw_reroot "$file" outgroup > "$file.rooted.newick";
done
# Use the [Newick Utilities](https://github.com/tjunier/newick_utils) command to re-root the tree with the specified outgroup
# Root each gene tree using an outgroup and save the rooted version

Step 3: PhyParts
We will run PhyParts twice with different parameters (-a 0 and -a 1) in the conflict analysis of phylogenetic trees:

  • Full concordance analysis (-a 1): This mode focuses on assessing the full concordance of gene trees with the reference species tree. It filters results to include only nodes with bootstrap support greater than a specified threshold (e.g., 50%). This analysis highlights the relationships that are well-supported and potentially reliable, allowing researchers to identify which gene trees consistently agree with the species tree.

  • Concordance analysis without filtering (-a 0): In this mode, PhyParts conducts a concordance analysis without applying any support filtering. This allows for a more comprehensive view of the data by including all nodes, regardless of their support values. This analysis can uncover areas of disagreement or low support that might be obscured in the first run, providing a fuller picture of the conflict among gene trees.

By combining the results from both analyses, researchers can gain insights into both the well-supported relationships and those that are more controversial or uncertain, ultimately contributing to a deeper understanding of the evolutionary dynamics within the studied taxa.

Step 3-1: Full concordance analysis

java -jar phyparts-0.0.1-SNAPSHOT-jar-with-dependencies.jar -a 1 -v -s 50 -d directory/path/to/gene_trees -m RAxML.rooted.newick -o RESULT_a1
# Run Phyparts analysis with support filtering
# -a: Analysis type (1 for full concordance analysis).
# -v: Verbose output for detailed logging.
# -s: Only keep nodes with support greater than 50%.
# -d: Directory containing gene trees.
# -m: Mapping tree file (species tree for comparison).
# -o: Output file prefix for results.

Step 3-2: Concordance analysis without filtering

java -jar phyparts-0.0.1-SNAPSHOT-jar-with-dependencies.jar -a 0 -v -d directory/path/to/gene_trees -m RAxML.rooted.newick -o RESULT_a0
# Run Phyparts analysis without support filtering
# -a: Analysis type (0 for concordance analysis without filtering).
# -v: Verbose output for detailed logging.
# -d: Directory containing gene trees.
# -m: Mapping tree file (species tree for comparison).
# -o: Output file prefix for results.

Step 4: Combining results

install.packages("phytools")
library(phytools)
# Use R to combine the results from the two rounds of Phyparts analysis

read.tree("RESULT_a0.concon.tre") -> No_bs # Tree file from concordance analysis without support filtering
read.tree("RESULT_a1.concon.tre") -> bs_full_concordance # Tree file from full concordance analysis with support filtering (e.g., Bootstrap 50)
# Read the tree files generated by Phyparts

total_no_bs <- No_bs[1](/PhyloAI/Ortho2Web/wiki/1)
# Get a tree to add total node numbers

total_no_bs$node.label <- mapply("+", as.numeric(No_bs[1](/PhyloAI/Ortho2Web/wiki/1)$node.label), as.numeric(No_bs[2](/PhyloAI/Ortho2Web/wiki/2)$node.label))
# Add node labels from both analyses to get the total number of nodes

total_no_bs$node.label[is.na(total_no_bs$node.label)] <- ""
total_no_bs$node.label[total_no_bs$node.label == "0"] <- ""
# Remove NA and zero values from node labels

append(bs_full_concordance, total_no_bs, after = 2) -> full_concordance_and_total_nodes
# Append total node numbers to the full concordance tree

write.tree(full_concordance_and_total_nodes, file = "RESULT_a1.concon.tre")
# Write the modified tree back to the file

Step 5: Visualization

python phypartspiecharts_missing_uninformative.py RAxML.rooted.newick RESULT_a1 538
# Generate pie charts to visualize conflict and concordance in the gene trees
# Arguments: mapping_tree, phyparts output file prefix (combined file), number of gene trees.

7.2 Quartet Sampling

Step1: Installation

conda install bioconda::raxml-ng
# Install RAxML-NG, which is required for running some of the phylogenetic analyses

Step 2: Quartet Sampling analysis

python3 quartetsampling-master/pysrc/quartet_sampling.py --tree astral.tre --align concatenated.phy --reps 100 --threads 8 --ignore-errors --lnlike 2 --result-prefix RESULT
# Run Quartet Sampling to evaluate the phylogenetic tree
# --tree: Input tree file (in Newick format)
# --align: Input alignment file (in PHYLIP format)
# --reps: Number of replicates for the analysis
# --threads: Number of threads to use for parallel processing
# --ignore-errors: Ignore errors from RAxML and PAUP during execution
# --lnlike: Minimum difference between the best and second-best trees for likelihood comparison
# --result-prefix: Prefix for the output result files

Step 3: Visualization

conda install -c bioconda bioconductor-ggtree
# Install ggtree, an R package for visualizing phylogenetic trees

Rscript plot_QC_ggtree.R
# Use an R script to plot the quartet sampling results