empirical - sungsik-kong/PhyNEST.jl GitHub Wiki

The evidence of hybridization among primates has increased with growing fossil evidence and genomic dataset. Vanderpool et al., (2020) explore hybridization among primates deeper in evolutionary time. Using $\Delta$-test (similar to the D-statistic, but using gene trees), the authors detected six introgression events, all of which occurred among the species that belong to the group of Primates called Papionini. The putative introgression events are shown below (Fig. 4 in the original study). The authors mention that the phylogenetic networks analyses using PhyloNet and SNaQ was unsuccessful and gave ambiguous results.

In this part of the tutorial, we are going use PhyNEST to estimate a phylogenetic network using the data from Vanderpool et al., (2020). The PHYLIP sequence alignment is prepared for you in the folder PhyNEST.jl.wiki/example-data with the name Vanderpool2020.phy. This alignment contains eight sequences, one for each ingroup species in the above unrooted tree plus an outgroup Callithrix jacchus. The outgroup was selected based on the larger species tree presented in the original study. The alignment length is 1,761,114 bp. Please follow the tasks below on your own, and let the instructor know if you have any questions or issues 😄.

Tasks

Please do the following tasks:

  • Move to the directory that contains the sequence alignment.

    • The /example-data folder should be inside where you cloned PhyNEST github wiki locally (see Input if you don't remember this step).
  • Open julia and load PhyNEST.

    • See a suggestion
      $ julia
      julia> using PhyNEST
      
  • Parse the alignment file Vanderpool2020.phy and save .ckp file as well.

    • See a suggestion
      julia> data=readPhylip("Vanderpool2020.phy",checkpoint=true)
      

      Here, you should be able to see the progress bar like below as we did not set the optional argument showProgress=false.

      julia> data=readPhylip("Vanderpool2020.phy",checkpoint=true)
      Progress:  21%[==========>                       ]  ETA: 0:04:06
      

      It took <5 minutes to complete on my machine.

  • Set the starting topology using as: (Callithrix_jacchus,(((Cercocebus_atys,Mandrillus_leucophaeus),(Papio_anubis,Theropithecus_gelada)),(Macaca_nemestrina,(Macaca_fascicularis,Macaca_mulatta))));. This Newick string represents the species topology shown above plus an outgroup.

    • See a suggestion
      julia> start_topology=readTopology("(Callithrix_jacchus,(((Cercocebus_atys,Mandrillus_leucophaeus),(Papio_anubis,Theropithecus_gelada)),(Macaca_nemestrina,(Macaca_fascicularis,Macaca_mulatta))));")
      

      You should be able to see something like:

      julia> start_topology=readTopology("(Callithrix_jacchus,(((Cercocebus_atys,Mandrillus_leucophaeus),(Papio_anubis,Theropithecus_gelada)),(Macaca_nemestrina,(Macaca_fascicularis,Macaca_mulatta))));")
      PhyloNetworks.HybridNetwork, Rooted Network
      14 edges
      15 nodes: 8 tips, 0 hybrid nodes, 7 internal tree nodes.
      tip labels: Callithrix_jacchus, Cercocebus_atys, Mandrillus_leucophaeus, Papio_anubis, ...
      (Callithrix_jacchus,(((Cercocebus_atys,Mandrillus_leucophaeus),(Papio_anubis,Theropithecus_gelada)),(Macaca_nemestrina,(Macaca_fascicularis,Macaca_mulatta))));
      
  • Compute the composite likelihood of the starting tree given the data.

    • See a suggestion
      julia> stats,start_topology_upd=do_optimization(start_topology,data)
      

      It took about 3 seconds to finish and I got 1.2042195993374506e8 for the composite likelihood of the starting tree.

      julia> stats.minimum
      1.2042195993374506e8
      
  • Run a network analysis using the starting topology with the following conditions:

    • Set outgroup as "Callithrix_jacchus"

    • Hill climbing searching strategy

    • Number of hybridization is assumed to be 1

    • Name the output file as monkey_tree

    • Conduct a single independent search by setting the optional argument number_of_runs=1. This is almost never recommended for a phylogenetic analysis and PhyNEST set number_of_runs=10 by default. We set it to 1 here only to complete the analysis fast for tutorial purpose.

    • See a suggestion
      julia> network=phyne!(start_topology,data,"Callithrix_jacchus",do_hill_climbing=true,hmax=1,filename="monkey_tree",number_of_runs=1)
      
  • Above step should take about <10 minutes to complete. If you are short of time, file monkey_tree_full.out in /example-data contains the final output file for this analysis conducted previously. You can use this file to move on to the next task.

  • Visualize the best network estimated using DendroScope.

    • See a suggestion
⚠️ **GitHub.com Fallback** ⚠️