empirical - sungsik-kong/PhyNEST.jl GitHub Wiki
The evidence of hybridization among primates has increased with growing fossil evidence and genomic dataset. Vanderpool et al., (2020) explore hybridization among primates deeper in evolutionary time. Using PhyloNet and SNaQ was unsuccessful and gave ambiguous results.
In this part of the tutorial, we are going use PhyNEST to estimate a phylogenetic network using the data from Vanderpool et al., (2020). The PHYLIP sequence alignment is prepared for you in the folder PhyNEST.jl.wiki/example-data with the name Vanderpool2020.phy. This alignment contains eight sequences, one for each ingroup species in the above unrooted tree plus an outgroup Callithrix jacchus. The outgroup was selected based on the larger species tree presented in the original study. The alignment length is 1,761,114 bp. Please follow the tasks below on your own, and let the instructor know if you have any questions or issues 😄.
Please do the following tasks:
-
Move to the directory that contains the sequence alignment.
- The
/example-datafolder should be inside where you clonedPhyNESTgithub wiki locally (see Input if you don't remember this step).
- The
-
Open
juliaand loadPhyNEST.-
See a suggestion
$ julia julia> using PhyNEST
-
-
Parse the alignment file
Vanderpool2020.phyand save.ckpfile as well.-
See a suggestion
julia> data=readPhylip("Vanderpool2020.phy",checkpoint=true)Here, you should be able to see the progress bar like below as we did not set the optional argument
showProgress=false.julia> data=readPhylip("Vanderpool2020.phy",checkpoint=true) Progress: 21%[==========> ] ETA: 0:04:06It took <5 minutes to complete on my machine.
-
-
Set the starting topology using as:
(Callithrix_jacchus,(((Cercocebus_atys,Mandrillus_leucophaeus),(Papio_anubis,Theropithecus_gelada)),(Macaca_nemestrina,(Macaca_fascicularis,Macaca_mulatta))));. This Newick string represents the species topology shown above plus an outgroup.-
See a suggestion
julia> start_topology=readTopology("(Callithrix_jacchus,(((Cercocebus_atys,Mandrillus_leucophaeus),(Papio_anubis,Theropithecus_gelada)),(Macaca_nemestrina,(Macaca_fascicularis,Macaca_mulatta))));")You should be able to see something like:
julia> start_topology=readTopology("(Callithrix_jacchus,(((Cercocebus_atys,Mandrillus_leucophaeus),(Papio_anubis,Theropithecus_gelada)),(Macaca_nemestrina,(Macaca_fascicularis,Macaca_mulatta))));") PhyloNetworks.HybridNetwork, Rooted Network 14 edges 15 nodes: 8 tips, 0 hybrid nodes, 7 internal tree nodes. tip labels: Callithrix_jacchus, Cercocebus_atys, Mandrillus_leucophaeus, Papio_anubis, ... (Callithrix_jacchus,(((Cercocebus_atys,Mandrillus_leucophaeus),(Papio_anubis,Theropithecus_gelada)),(Macaca_nemestrina,(Macaca_fascicularis,Macaca_mulatta))));
-
-
Compute the composite likelihood of the starting tree given the data.
-
See a suggestion
julia> stats,start_topology_upd=do_optimization(start_topology,data)It took about 3 seconds to finish and I got
1.2042195993374506e8for the composite likelihood of the starting tree.julia> stats.minimum 1.2042195993374506e8
-
-
Run a network analysis using the starting topology with the following conditions:
-
Set outgroup as "Callithrix_jacchus"
-
Hill climbing searching strategy
-
Number of hybridization is assumed to be 1
-
Name the output file as
monkey_tree -
Conduct a single independent search by setting the optional argument
number_of_runs=1. This is almost never recommended for a phylogenetic analysis andPhyNESTsetnumber_of_runs=10by default. We set it to 1 here only to complete the analysis fast for tutorial purpose. -
See a suggestion
julia> network=phyne!(start_topology,data,"Callithrix_jacchus",do_hill_climbing=true,hmax=1,filename="monkey_tree",number_of_runs=1)
-
-
Above step should take about <10 minutes to complete. If you are short of time, file
monkey_tree_full.outin/example-datacontains the final output file for this analysis conducted previously. You can use this file to move on to the next task. -
Visualize the best network estimated using
DendroScope.-
See a suggestion
-