searches - sungsik-kong/PhyNEST.jl GitHub Wiki

Network analysis using phyne!

PhyNEST can search the network space using one of two strategies: hill climbing and simulated annealing. In brief, hill climbing would complete the analysis quicker but has a risk of becoming stuck at the local optimum; whereas simulated annealing searches the network space much more exhaustively, but can take longer to finish.

As demonstrated in Prerequisite, phyne!(args) requires three mandatory arguments: the starting topology, data, and the outgroup. In the following section, we conduct the network analysis using one of the two strategies and go through how to interpret the output.

Hill-climbing

In phyne!(args), hill climbing is selected by default. A network analysis can be started using a command as simple as the one shown below. We are going to name the final network from this analysis as net1. Note random_starting_topology and phylip_data have been specified in the previous sections.

julia> net1=phyne!(random_starting_topology,phylip_data,"outgroup")

As the analysis begins, something similar to the following is printed on screen. This includes some useful information about the analysis (e.g., time of the analysis start, dataset size, selected searching strategy, specified starting topology, some preset settings, and so on).

julia> net1=phyne!(random_starting_topology,phylip_data,"outgroup")
PhyNEST: Phylogenetic Network Estimation using SiTe patterns
Analysis start: 2023-07-01 at 16:48:52
Input sequence file: sample_n5h1.phy
Number of sequences: 5
Sequence length: 1000000
Starting Topology: (outgroup,species_1,species_2,species_3,species_4);
Outgroup specified for rooting: outgroup
Number of maximum reticulation(s): 1
The maximum number of iterations for each optimization: 1000
Search algorithm selected: Hill climbing
The maximum number of steps during search: 250000
Output file store path: /Users/khaosan/Dropbox/PhyNEST.jl.wiki/example-data
The number of processors for this analysis: 1

At the end of each independent run, something similar to the following is printed on the screen. These tell us some specifics of the network search at that independent run. Roughly, it summarizes how many of each network 'moves' were made during the search, the reason for terminating the search (e.g., researched maximum number of steps, reached maximum number of consecutive failures, or else), and the final network estimated and its composite likelihood.

(1/10) Searching for the best network using the hill climbing algorithm, 2023-07-01 16:49:1.496
The search terminated at step 87 and at 75th consecutive failures.
Summary of each move:
Insertion of reticulation edge: 4 proposed, 1 accepted.
Tail move of reticulation edge: 25 proposed, 0 accepted.
Head move of reticulation edge: 25 proposed, 0 accepted.
Change the direction of reticulation edge: 5 proposed, 0 accepted.
Deletion of reticulation edge: 0 proposed, 0 accepted.
Nearest-neighbor interchange (NNI): 27 proposed, 3 accepted.
On the current topology, 96 moves were made, including 21 unsuccessful moves.
Terminated although it neither reached the maximum number of steps or failures,
possibly because there was no more move to make.
(1/10) Estimated topology in this run: (outgroup:4.099307748692147,((((species_3:0.6316119541666688,species_2:0.6316119541666688):0.31638660078291214,(species_4:1.0000000000000006e-10)#H6:0.947998554849581::2.1857476158833815e-15):6.947399849510649e-5,species_1:0.9480680289480761):0.5673270933421422,#H6:1.5153951221902182::0.9999999999999978):2.5839126264019288);
(1/10) Composite likelihood of the estimated topology in this run: 4.79908686476507e6

At the end of the analysis, phyne!(args) summarizes the network estimated at each independent run followed by the best network estimated throughout all runs. The best network is also stored in .out file that is created upon the completion of the analysis.

-----Summary of the networks found
Run   Composite Likelihood    Network
1	4.79908686477e6         (outgroup:4.099307748692147,((((species_3:0.6316119541666688,species_2:0.6316119541666688):0.31638660078291214,(species_4:1.0000000000000006e-10)#H6:0.947998554849581::2.1857476158833815e-15):6.947399849510649e-5,species_1:0.9480680289480761):0.5673270933421422,#H6:1.5153951221902182::0.9999999999999978):2.5839126264019288);
2	4.74134463406e6         (outgroup:9.873014355923681,(species_4:4.978557302744444,((species_3:1.5703960202935816,(species_2:1.4303350626712767)#H6:0.14006095762230486::0.5246318237325145):1.4274316184507447,(species_1:1.4303350626712767,#H6:0.0::0.4753681762674855):1.5674925760730496):1.9807296640001177):4.894457053179237);
3	4.74767656888e6         (outgroup:6.587547566712631,(species_4:3.16335468187287,(#H6:1.6950791623269237::0.2300024805998422,((species_2:2.1704607574918803e-15,(species_1:2.1704607574918803e-15)#H6:0.0::0.7699975194001578):1.2775233699426514,species_3:1.2775233699426536):0.4175557923842723):1.4682755195459443):3.424192884839761);
4	4.79934812694e6         (outgroup:4.588071773551179,(#H8:1.1223574091843962::0.4545680591271708,((species_3:1.1239958623063409,(species_4:1.1239958623063409)#H8:0.0::0.5454319408728292):0.0,(species_1:0.6784288545012462,species_2:0.6784288545012462):0.4455670078050946):1.1223574091843962):2.341718502060442);
5	4.74134463406e6         (outgroup:9.873014056142255,(((species_3:1.5703953920059508,(species_2:0.0)#H6:1.5703953920059508::0.5246316139942712):1.4274321357773632,(species_1:1.4303356774878893,#H6:1.4303356774878893::0.47536838600572884):1.5674918502954247):1.9807296102992824,species_4:4.978557138082596):4.894456918059658);
6	4.74134463406e6         (outgroup:9.873013935808101,(((species_1:1.430336353730211,(species_2:0.0)#H6:1.430336353730211::0.4753686041425541):1.56749113182852,(#H6:1.5703947913279592::0.5246313958574459,species_3:1.5703947913279592):1.4274326942307718):1.9807295865914147,species_4:4.978557072150146):4.894456863657956);
7	4.80893925052e6         (outgroup:4.071258357173914,((species_2:0.6378694203368405,(species_1:0.6378694203368405)#H8:0.0::1.0):0.697848040883944,(#H8:0.697848040883944::0.0,(species_3:1.3357174612207845,species_4:1.3357174612207845):0.0):0.0):2.7355408959531293);
8	4.74846998629e6         (outgroup:6.910915745453826,((species_4:3.3556875636861974,#H8:1.9927504539207754::0.8602778203425111):3.552713678800501e-15,((species_3:1.362937109765422)#H8:0.0::0.1397221796574889,(species_1:1.3629371097654168,species_2:1.3629371097654168):5.10702591327572e-15):1.992750453920779):3.555228181767625);
9	4.74134463406e6         (outgroup:9.87301380985743,(species_4:4.978556995305276,((species_1:1.4303351997406968,(species_2:1.4303351997406968)#H6:0.0::0.4753682702546477):1.5674922440975587,(species_3:1.570395656690376,#H6:0.14006045694967928::0.5246317297453523):1.4274317871478794):1.9807295514670202):4.894456814552154);
10	4.74134463406e6         (outgroup:9.873013935808101,(((species_1:1.430336353730211,(species_2:0.0)#H6:1.430336353730211::0.4753686041425541):1.56749113182852,(species_3:1.5703947913279592,#H6:1.5703947913279592::0.5246313958574459):1.4274326942307718):1.9807295865914147,species_4:4.978557072150146):4.894456863657956);

-----end of analysis
PhyloNetworks.HybridNetwork, Rooted Network
11 edges
11 nodes: 5 tips, 1 hybrid nodes, 5 internal tree nodes.
tip labels: outgroup, species_4, species_1, species_2, ...
(outgroup:9.873,(species_4:4.979,((species_1:1.43,(species_2:1.43)#H6:0.0::0.475):1.567,(species_3:1.57,#H6:0.14::0.525):1.427):1.981):4.894);

Simulated annealing

When running phyne!(args), simulated annealing can be selected as a searching strategy by setting the optional argument do_hill_climbing=false, as shown in the command below. Just like in hill climbing, three mandatory arguments are required: the starting topology, data, and the outgroup taxon. We are going to name the final network from this analysis as net1.

net1=phyne!(random_starting_topology,phylip_data,"outgroup",do_hill_climbing=false)

As the analysis begins, some information on the setting of the analysis will be printed on screen.

PhyNEST: Phylogenetic Network Estimation using SiTe patterns
Analysis start: 2023-07-01 at 16:56:26
Input sequence file: sample_n5h1.phy
Number of sequences: 5 
Sequence length: 1000000
Starting Topology: (outgroup,species_1,species_2,species_3,species_4);
Outgroup specified for rooting: outgroup
Number of maximum reticulation(s): 1
The maximum number of iterations for each optimization: 1000
Search algorithm selected: Simulated annealing
The maximum number of steps during search: 250000
Output file store path: /Users/khaosan/Dropbox/PhyNEST.jl.wiki/example-data
The number of processors for this analysis: 1

In contrast to the hill climbing analysis, the simulated annealing conducts initial burn-in to compute the maximum change in the composite likelihood that can be made in a single step. See Kong et al. (2022) for detail. The number of steps to be made in the burn-in period can be modified using the optional argument number_of_burn_in, which is set to 25 by default.

(1/10) Searching for the best network using the simulated annealing algorithm, 2023-07-01 16:56:26.756
Running 25 burn ins...
Burn in complete. At most -log likelihood value can change 334460.5637747077 at a single step.

Once the initial burn-in is complete, the network search begins. At the end of each independent run, a list of the best networks visited during the search will be printed. The number of best networks ($k$) to be printed is set as 10 by default, and can be modified using the optional argument k. At the end of each independent run, a summary of network search is printed.

Showing 10 best topologies found in this run:
Rank   Composite Likelihood    Network
1	4.74143033362e6         (outgroup:12.581048551217803,(#H6:4.897582014826698::0.7318362486587091,((species_3:3.039441659676444,(species_2:1.784703262508248,(species_1:1.784703262508248)#H6:0.0::0.2681637513412909):1.2547383971681962):3.3885162791746737,species_4:6.427957938851118):0.2543273384838285):5.898763273882857);
2	4.74143834385e6         (outgroup:12.562263849641377,((#H6:4.701532047553972::0.7159853236380884,species_4:6.4548749094151):0.0,(species_3:3.0369634510044548,(species_2:1.7533428618611278,(species_1:1.7533428618611278)#H6:0.0::0.2840146763619116):1.283620589143327):3.417911458410645):6.1073889402262775);
3	4.74783630554e6         (outgroup:6.93418009627459,((species_4:3.3535213465512714,(species_3:1.011515677716189)#H6:2.3420056688350823::0.16264030010942526):1.163069640597314e-12,(species_1:1.4032039735968698,(#H6:0.0::0.8373596998905748,species_2:1.011515677716189):0.39168829588068066):1.9503173729555647):3.580658749722155);
4	4.7483358255e6         (outgroup:6.674479853666094,(((species_3:1.5587295004045836,(#H6:0.0::0.8311735926847826,species_1:2.884290042173632e-14):1.5587295004045547):0.0,(species_2:2.884290042173632e-14)#H6:1.5587295004045547::0.16882640731521736):1.6670266315039588,species_4:3.2257561319085424):3.4487237217575517);
5	4.74834233974e6         (outgroup:6.382195807067809,((species_1:1.462374223568136,(species_2:1.199211130895848,(species_3:5.0289788337618055e-18,(species_4:5.0289788337618055e-18)#H6:0.0::0.03143170824941877):1.199211130895848):0.2631630926722881):1.6757597461900913,#H6:3.1381339697582273::0.9685682917505812):3.244061837309582);
6	4.74834233974e6         (outgroup:6.382195816283125,((species_1:1.4623742273143248,(species_2:1.1992111336023625,(#H6:0.0::0.9685682916757022,species_3:1.1841342157270892e-15):1.1992111336023614):0.2631630937119622):1.6757597476574464,(species_4:1.1841342157270892e-15)#H6:3.13813397497177::0.03143170832429776):3.2440618413113538);
7	4.74839208038e6         (outgroup:6.353513765761848,((species_3:1.4517680360040277,(species_2:1.1927430854265588,(#H6:0.0::0.9710549020455946,species_1:9.10487492168278e-17):1.1927430854265588):0.2590249505774689):1.664473063643019,(species_4:9.10487492168278e-17)#H6:3.1162410996470467::0.028945097954405363):3.2372726661148015);
8	4.74841907266e6         (outgroup:6.571509285591447,(((species_2:1.2595083606777986,(species_1:1.2595083606777986,#H6:0.0::0.23101588763540726):0.0):1.1538995889503987,(species_3:1.2595083606777986)#H6:1.1538995889503987::0.7689841123645927):0.7568762502156829,species_4:3.17028419984388):3.401225085747567);
9	4.74844505806e6         (outgroup:6.084643849201943,(((#H6:0.0::1.0,(species_3:1.1164900142307739,species_2:1.1164900142307739):0.19372362476703442):0.05006518983479147,species_1:1.3602788288325998):1.5429700541258395,(species_4:1.3102136389978083)#H6:1.593035243960631::0.0):3.181394966243504);
10	4.74844505806e6         (outgroup:6.084643822724603,((species_4:1.1487587724660282)#H6:1.7544900943312942::1.0,((species_3:1.1164900054260694,species_2:1.1164900054260694):0.2437888139878226,(#H6:0.0::0.0,species_1:1.1487587724660282):0.21152004694786375):1.5429700473834305):3.1813949559272805);
Speciation times for some newicks may not have updated if estimates are weird (e.g., NaN).
The search terminated at step 1000 and at 11th consecutive failures and (Cooling schedule)ci=2000.0279837711678.
Summary of each move:
Insertion of reticulation edge: 3 proposed, 0 accepted.
Tail move of reticulation edge: 252 proposed, 30 accepted.
Head move of reticulation edge: 258 proposed, 10 accepted.
Change the direction of reticulation edge: 250 proposed, 21 accepted.
Deletion of reticulation edge: 0 proposed, 0 accepted.
Nearest-neighbor interchange (NNI): 237 proposed, 178 accepted.
On the current topology, 11 moves were made, including 0 unsuccessful moves.
Terminated because it reached the maximum number of steps (current maximum_number_of_steps=1000).
It is recommended to increase the number of maximum_number_of_steps and rerun the analysis.
(1/10) Estimated topology in this run: (outgroup:12.581048551217803,((species_1:1.784703262508248)#H6:4.897582014826698::0.7318362486587091,((species_3:3.039441659676444,(species_2:1.784703262508248,#H6:0.0::0.2681637513412909):1.2547383971681962):3.3885162791746737,species_4:6.427957938851118):0.2543273384838285):5.898763273882857);
(1/10) Composite likelihood of the estimated topology in this run: 4.741430333620114e6

Once all independent runs are complete, phyne!(args) summarizes the network estimated at each independent run followed by the best network estimated throughout all runs. The final network is also stored in .out file that is created upon the completion of the analysis.

-----Summary of the networks found
Run   Composite Likelihood    Network
1	4.74143033362e6         (outgroup:12.581048551217803,((species_1:1.784703262508248)#H6:4.897582014826698::0.7318362486587091,((species_3:3.039441659676444,(species_2:1.784703262508248,#H6:0.0::0.2681637513412909):1.2547383971681962):3.3885162791746737,species_4:6.427957938851118):0.2543273384838285):5.898763273882857);
2	4.74134463406e6         (outgroup:9.873014273428936,(species_4:4.978557251913669,((species_1:1.4303363086011327,#H8:0.0::0.4753685600009919):1.5674912971343646,((species_2:1.4303363086011327)#H8:0.14005866006744672::0.5246314399990081,species_3:1.5703949686685794):1.427432637066918):1.9807296461781716):4.894457021515267);
3	4.74134463406e6         (outgroup:9.873014355923681,(species_4:4.978557302744444,((species_3:1.5703960202935816,(species_2:1.4303350626712767)#H6:0.14006095762230486::0.5246318237325145):1.4274316184507447,(#H6:0.0::0.4753681762674855,species_1:1.4303350626712767):1.5674925760730496):1.9807296640001177):4.894457053179237);
4	4.74134463406e6         (outgroup:9.873012622501722,(species_4:4.978556373146908,((species_3:1.5703946030438305,(species_2:1.4303358438581433)#H6:0.14005875918568722::0.5246314504223545):1.427432433976793,(#H6:0.0::0.4753685495776455,species_1:1.4303358438581433):1.5674911931624802):1.9807293361262848):4.894456249354814);
5	4.74134463406e6         (outgroup:9.87301380985743,(species_4:4.978556995305276,((species_1:1.4303351997406968,#H6:0.0::0.4753682702546477):1.5674922440975587,(species_3:1.570395656690376,(species_2:1.4303351997406968)#H6:0.14006045694967928::0.5246317297453523):1.4274317871478794):1.9807295514670202):4.894456814552154);
6	4.74847887999e6         (outgroup:6.080894029460262,(species_4:2.9011875495072474,((species_1:1.1168388994487088,species_2:1.1168388994487088):0.24145011464781096,species_3:1.3582890140965198):1.5428985354107276):3.179706479953014);
7	4.74136203165e6         (outgroup:10.279852508216186,(species_4:5.202396348480912,(#H8:3.2552607457770186::0.37946741741044854,(species_1:2.3633048477476857,(species_2:0.37557495918111433,(species_3:0.37557495918111433)#H8:0.0::0.6205325825895515):1.9877298885665713):1.2675308572104473):1.571560643522779):5.077456159735274);
8	4.74145585761e6         (outgroup:12.552567752938712,((species_4:6.4494518699325365,#H8:4.696467979191502::0.2831880000567513):2.3243629243552277e-12,((species_2:1.752983890741034,(species_3:1.752983890741034)#H8:0.0::0.7168119999432487):1.2841855789481549,species_1:3.037169469689189):3.412282400245672):6.103115883003851);
9	4.74134463406e6         (outgroup:9.87301380985743,(species_4:4.978556995305276,((species_1:1.4303351997406968,#H6:0.0::0.4753682702546477):1.5674922440975587,((species_2:1.4303351997406968)#H6:0.14006045694967928::0.5246317297453523,species_3:1.570395656690376):1.4274317871478794):1.9807295514670202):4.894456814552154);
10	4.74134463406e6         (outgroup:9.87301380985743,(species_4:4.978556995305276,((species_1:1.4303351997406968,#H6:0.0::0.4753682702546477):1.5674922440975587,((species_2:1.4303351997406968)#H6:0.14006045694967928::0.5246317297453523,species_3:1.570395656690376):1.4274317871478794):1.9807295514670202):4.894456814552154);

-----end of analysis
Best topology: (outgroup:9.873012622501722,(species_4:4.978556373146908,((species_3:1.5703946030438305,(species_2:1.4303358438581433)#H6:0.14005875918568722::0.5246314504223545):1.427432433976793,(#H6:0.0::0.4753685495776455,species_1:1.4303358438581433):1.5674911931624802):1.9807293361262848):4.894456249354814);)

Parallelization

Since simulated annealing traverses the network space much more exhaustively than hill climbing, it would require longer time to complete the analysis. We recommend, if possible, to parallelize each independent run when using the simulated annealing strategy (or hill climbing, too). This can be done very simply by using the julia package Distributed.

First, in order to obtain the number of cores available in julia in prior to the parallelization, the following can be done in julia, for example.

julia> length(Sys.cpu_info())
8

In the below example, we load the package Distributed (may need to install this package), launch 8 worker processors (this value would vary depending on machine), load PhyNEST on each processor, and conduct the network analysis. Note the .log file will be much simplified when the analysis is parallelized.

julia> using Distributed
julia> addprocs(8)
julia> @everywhere using PhyNEST
julia> net1=phyne!(random_starting_topology,phylip_data,"outgroup",do_hill_climbing=false)