hyde - sungsik-kong/PhyNEST.jl GitHub Wiki
HyDe
HyDe is a method originally proposed in Blischak et al., (2018) and is implemented in a Python module called phyde
(Pythonic Hybrid Detection). HyDe performs hypothesis tests on quartets of taxa (including outgroup) using phylogenetic invariants. See original documentation for more information.
HyDe implementation in PhyNEST
can be executed using the function HyDe
. More specifically, run_hyde.py
in the original module is replicated in the function HyDe
. The mandatory input arguments are Phylip
object that contains the site pattern frequency information of the alignment parsed using the function readPhylip
and the name of the outgroup taxa. By default, HyDe
will only show significant tests (display_all=false
). By setting display_all=true
, HyDe
will display the results for every combination of four taxa in the alignment. See example below.
julia> p=readPhylip("sample_n5h1.txt")
Progress:
0+---------------+100%
***************complete
Summary of Phylip File
Parsing the file [sample_n5h1.txt] took 23.399 seconds.
Number of taxa: 5
Species names: ["5", "4", "3", "1", "2"]
Alignment length (b.p): 1000000
Site patterns frequencies for 120 quartets computed and stored.
Try `show_sp()` function to see all quartet site patterns.
julia> df=HyDe(p,"5")
Tip: if neccessary, use function showallDF(df) to see all the rows.
2×11 DataFrame
Row │ outgroup P1 Hybrid P2 AABB ABAB ABBA Gamma Zscore Pvalue significance
│ String String String String Int64 Int64 Int64 Float64 Float64 Float64 String
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 5 3 2 1 8005 1991 8057 0.502152 47.6571 0.0 *
2 │ 5 1 2 3 8057 1991 8005 0.497848 47.6571 0.0 *
Data frame with 11 columns is displayed at the end of the analysis. First four columns are the four taxa included in the test in the order of outgroup
, parent taxon 1
, putative hybrid
, parental taxon 2
, followed by three columns that represents the site pattern frequencies AABB
, ABAB
and ABBA
for the four taxa. Next three columns represent test results, estimate of Gamma
, Z-scare
, and P-value
. Using the $\alpha$ level that is set as 0.05 by default (optional argument pval=0.05
), the significant test will have *
at the last column.
julia> df=HyDe(p,"5",display_all=true)
Tip: if neccessary, use function showallDF(df) to see all the rows.
24×11 DataFrame
Row │ outgroup P1 Hybrid P2 AABB ABAB ABBA Gamma Zscore Pvalue significance
│ String String String String Int64 Int64 Int64 Float64 Float64 Float64 String
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 5 4 3 1 18168 2573 2611 0.00243076 0.528406 0.298609
2 │ 5 4 3 2 23742 2022 2064 0.00192997 0.657666 0.255376
3 │ 5 4 1 2 23703 2069 2069 0.0 -99999.9 1.0
4 │ 5 3 1 2 8005 8057 1991 0.9915 -0.412067 0.659855
5 │ 5 4 1 3 18168 2611 2573 -0.00244861 -99999.9 1.0
6 │ 5 3 4 1 2573 18168 2611 0.49939 -216.327 1.0
7 │ 5 3 1 4 2573 2611 18168 1.00245 -0.527118 0.700944
8 │ 5 1 4 3 2611 18168 2573 0.50061 -216.327 1.0
9 │ 5 1 3 4 2611 2573 18168 0.997569 0.528406 0.298609
10 │ 5 4 2 3 23742 2064 2022 -0.00194121 -99999.9 1.0
11 │ 5 3 4 2 2022 23742 2064 0.499516 -339.45 1.0
12 │ 5 3 2 4 2022 2064 23742 1.00194 -0.656395 0.744215
13 │ 5 2 4 3 2064 23742 2022 0.500484 -339.45 1.0
14 │ 5 2 3 4 2064 2022 23742 0.99807 0.657666 0.255376
15 │ 5 4 2 1 23703 2069 2069 0.0 -99999.9 1.0
16 │ 5 1 4 2 2069 23703 2069 0.5 -336.307 1.0
17 │ 5 1 2 4 2069 2069 23703 NaN 0.0 0.5
18 │ 5 2 4 1 2069 23703 2069 0.5 -336.307 1.0
19 │ 5 2 1 4 2069 2069 23703 NaN 0.0 0.5
20 │ 5 3 2 1 8005 1991 8057 0.502152 47.6571 0.0 *
21 │ 5 1 3 2 8057 8005 1991 1.00872 -99999.9 1.0
22 │ 5 1 2 3 8057 1991 8005 0.497848 47.6571 0.0 *
23 │ 5 2 3 1 1991 8005 8057 -0.00872191 -0.408534 0.658559
24 │ 5 2 1 3 1991 8057 8005 0.00849951 -0.412067 0.659855
HyDe
can also conduct hybrid detection analysis with multiple individuals per population/species. In this case, a map
file is required. Taxon map file is a simple text file with one individual per row and a tab separating the individual's name as appear in the alignment from the name of the population/species it belongs to. Unlike in the original python implementation, our implementation does not require the individuals in the map file to be in the same order as the DNA sequence data file with all individuals in a particular taxon group together sequentially. An example of a map file is shown below.
shell> cat map.txt
5 sp5out
4 sp5out
3 sp3
1 sp1
2 sp2
To use the map file, simply specify the map file using the optional argument map
. When multiple individuals were assigned as an outgroup population/species, simply specify any one of the outgroup taxon. An example is shown below.
julia> df=HyDe(p,"5",map="map.txt")
Map file [map.txt] provided.
Tip: if neccessary, use function showallDF(df) to see all the rows.
2×11 DataFrame
Row │ outgroup P1 Hybrid P2 AABB ABAB ABBA Gamma Zscore Pvalue significance
│ String String String String Int64 Int64 Int64 Float64 Float64 Float64 String
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────
1 │ sp5out sp3 sp2 sp1 15841 3418 15909 0.501365 49.4337 0.0 *
2 │ sp5out sp1 sp2 sp3 15909 3418 15841 0.498635 49.4337 0.0 *
julia> df=HyDe(p,"5",map="map.txt", display_all=true)
Map file [map.txt] provided.
Tip: if neccessary, use function showallDF(df) to see all the rows.
6×11 DataFrame
Row │ outgroup P1 Hybrid P2 AABB ABAB ABBA Gamma Zscore Pvalue significance
│ String String String String Int64 Int64 Int64 Float64 Float64 Float64 String
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ sp5out sp3 sp1 sp2 15841 15909 3418 0.994586 -0.270586 0.606645
2 │ sp5out sp3 sp2 sp1 15841 3418 15909 0.501365 49.4337 0.0 *
3 │ sp5out sp1 sp3 sp2 15909 15841 3418 1.0055 -99999.9 1.0
4 │ sp5out sp1 sp2 sp3 15909 3418 15841 0.498635 49.4337 0.0 *
5 │ sp5out sp2 sp3 sp1 3418 15841 15909 -0.00550384 -0.269113 0.606079
6 │ sp5out sp2 sp1 sp3 3418 15909 15841 0.00541444 -0.270586 0.606645
By setting the optional argument writecsv=true
(by default, writecsv=false
), the results can be locally stored in a .csv
file. This .csv
file will be named as HyDe-out.csv
by default, but can be modified by a user using the optional argument filename
.
julia> df=HyDe(p,"5",map="map.txt", display_all=true, writecsv=true)
Map file [map.txt] provided.
The results are stored as HyDe-out.csv in the working directory.
Tip: if neccessary, use function showallDF(df) to see all the rows.
6×11 DataFrame
Row │ outgroup P1 Hybrid P2 AABB ABAB ABBA Gamma Zscore Pvalue significance
│ String String String String Int64 Int64 Int64 Float64 Float64 Float64 String
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ sp5out sp3 sp1 sp2 15841 15909 3418 0.994586 -0.270586 0.606645
2 │ sp5out sp3 sp2 sp1 15841 3418 15909 0.501365 49.4337 0.0 *
3 │ sp5out sp1 sp3 sp2 15909 15841 3418 1.0055 -99999.9 1.0
4 │ sp5out sp1 sp2 sp3 15909 3418 15841 0.498635 49.4337 0.0 *
5 │ sp5out sp2 sp3 sp1 3418 15841 15909 -0.00550384 -0.269113 0.606079
6 │ sp5out sp2 sp1 sp3 3418 15909 15841 0.00541444 -0.270586 0.606645
shell> ls
HyDe-out.csv map.txt sample_n5h1.txt