observed site patterns - sungsik-kong/PhyNEST.jl GitHub Wiki
The observed quartet site pattern frequencies can be presented in a tabular format using the function show_sp(arg)
. The one and only mandatory arg
is the object created using readPhylip(args)
. Using the name we assigned to the data object in the previous page, phylip_data
, show_sp(arg)
can be executed using the command:
julia> df=show_sp(phylip_data)
The output is n
x m
DataFrame
object where n
= (k
choose 4) x 24 and m
= 19. k
represents the number of sequences. m
is composed of four columns that represents each taxon in a quartet plus 15 possible site patterns in a quartet. Below shows the output of df=show_sp(phylip_data)
that contains all quartet site pattern frequencies parsed from sample_n5h1.phy
:
Click here to see the output
julia> df=show_sp(phylipdata)
120×19 DataFrame
Row │ i j k l AAAA AAAB AABA AABB AABC ABAA ABAB ABAC ABBA BAAA ABBC CABC BACA BCAA ABCD
│ Any Any Any Any Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ outgroup species_4 species_3 species_1 731034 29756 29697 18168 1757 45126 2573 1399 2611 122101 3466 1405 3447 7203 257
2 │ outgroup species_4 species_3 species_2 736589 24201 24366 23742 1514 45895 2022 1181 2064 123279 2835 1144 2828 8129 211
3 │ outgroup species_4 species_1 species_2 736477 24254 24478 23703 1500 45938 2069 1135 2069 123249 2803 1096 2811 8209 209
4 │ outgroup species_3 species_1 species_2 757451 18709 25033 8005 690 24964 8057 692 1991 143031 2450 619 4062 4125 121
5 │ species_4 species_3 species_1 species_2 832577 20558 27291 7836 706 27149 7852 716 1427 67905 1165 640 2009 2109 60
6 │ outgroup species_4 species_1 species_3 731034 29697 29756 18168 1757 45126 2611 1405 2573 122101 3447 1399 3466 7203 257
7 │ outgroup species_3 species_4 species_1 731034 29756 45126 2573 1399 29697 18168 1757 2611 122101 3466 1405 7203 3447 257
8 │ outgroup species_3 species_1 species_4 731034 45126 29756 2573 1399 29697 2611 1405 18168 122101 7203 1757 3466 3447 257
9 │ outgroup species_1 species_4 species_3 731034 29697 45126 2611 1405 29756 18168 1757 2573 122101 3447 1399 7203 3466 257
10 │ outgroup species_1 species_3 species_4 731034 45126 29697 2611 1405 29756 2573 1399 18168 122101 7203 1757 3447 3466 257
11 │ species_4 outgroup species_3 species_1 731034 29756 29697 18168 1757 122101 2611 3466 2573 45126 1399 3447 1405 7203 257
12 │ species_4 outgroup species_1 species_3 731034 29697 29756 18168 1757 122101 2573 3447 2611 45126 1405 3466 1399 7203 257
13 │ species_4 species_3 outgroup species_1 731034 29756 122101 2611 3466 29697 18168 1757 2573 45126 1399 3447 7203 1405 257
14 │ species_4 species_3 species_1 outgroup 731034 122101 29756 2611 3466 29697 2573 3447 18168 45126 7203 1757 1399 1405 257
15 │ species_4 species_1 outgroup species_3 731034 29697 122101 2573 3447 29756 18168 1757 2611 45126 1405 3466 7203 1399 257
16 │ species_4 species_1 species_3 outgroup 731034 122101 29697 2573 3447 29756 2611 3466 18168 45126 7203 1757 1405 1399 257
17 │ species_3 outgroup species_4 species_1 731034 29756 45126 2573 1399 122101 2611 3466 18168 29697 1757 7203 1405 3447 257
18 │ species_3 outgroup species_1 species_4 731034 45126 29756 2573 1399 122101 18168 7203 2611 29697 1405 3466 1757 3447 257
19 │ species_3 species_4 outgroup species_1 731034 29756 122101 2611 3466 45126 2573 1399 18168 29697 1757 7203 3447 1405 257
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
103 │ species_3 species_4 species_1 species_2 832577 20558 27291 7836 706 67905 1427 1165 7852 27149 716 2009 640 2109 60
104 │ species_3 species_4 species_2 species_1 832577 27291 20558 7836 706 67905 7852 2009 1427 27149 640 1165 716 2109 60
105 │ species_3 species_1 species_4 species_2 832577 20558 67905 1427 1165 27291 7836 706 7852 27149 716 2009 2109 640 60
106 │ species_3 species_1 species_2 species_4 832577 67905 20558 1427 1165 27291 7852 2009 7836 27149 2109 706 716 640 60
107 │ species_3 species_2 species_4 species_1 832577 27291 67905 7852 2009 20558 7836 706 1427 27149 640 1165 2109 716 60
108 │ species_3 species_2 species_1 species_4 832577 67905 27291 7852 2009 20558 1427 1165 7836 27149 2109 706 640 716 60
109 │ species_1 species_4 species_3 species_2 832577 20558 27149 7852 716 67905 1427 1165 7836 27291 706 2109 640 2009 60
110 │ species_1 species_4 species_2 species_3 832577 27149 20558 7852 716 67905 7836 2109 1427 27291 640 1165 706 2009 60
111 │ species_1 species_3 species_4 species_2 832577 20558 67905 1427 1165 27149 7852 716 7836 27291 706 2109 2009 640 60
112 │ species_1 species_3 species_2 species_4 832577 67905 20558 1427 1165 27149 7836 2109 7852 27291 2009 716 706 640 60
113 │ species_1 species_2 species_4 species_3 832577 27149 67905 7836 2109 20558 7852 716 1427 27291 640 1165 2009 706 60
114 │ species_1 species_2 species_3 species_4 832577 67905 27149 7836 2109 20558 1427 1165 7852 27291 2009 716 640 706 60
115 │ species_2 species_4 species_3 species_1 832577 27291 27149 1427 640 67905 7852 2009 7836 20558 706 2109 716 1165 60
116 │ species_2 species_4 species_1 species_3 832577 27149 27291 1427 640 67905 7836 2109 7852 20558 716 2009 706 1165 60
117 │ species_2 species_3 species_4 species_1 832577 27291 67905 7852 2009 27149 1427 640 7836 20558 706 2109 1165 716 60
118 │ species_2 species_3 species_1 species_4 832577 67905 27291 7852 2009 27149 7836 2109 1427 20558 1165 640 706 716 60
119 │ species_2 species_1 species_4 species_3 832577 27149 67905 7836 2109 27291 1427 640 7852 20558 716 2009 1165 706 60
120 │ species_2 species_1 species_3 species_4 832577 67905 27149 7836 2109 27291 7852 2009 1427 20558 1165 640 716 706 60
83 rows omitted
Typically, n
is large and show_sp(arg)
will print the truncated table. The truncated table is often sufficient when the goal is to quickly check if readPhylip(args)
has parsed the alignment properly. However, the full table may be useful if a user plans to conduct other analysis using the observed quartet site pattern frequencies like Patterson's D-statistic, for instance.
Each site pattern has the heading (i.e., AAAA, AAAB, AABA ...). At a given site in the alignment, any one of the four nucleotide states can be assigned to a taxon. This results in 256 possible quartet site patterns, which can be generalized into 15 patterns under the Jukes-Cantor model as shown in Chifman and Kubatko (2015).
For example, the integer under the column AAAA
represents the number of sites where the four taxa in the quartet had identical nucleotide states (i.e., AAAA, CCCC, GGGG, or TTTT). Similarly, the integer under the column AAAB
represents the number of sites where the first three taxa in the quartet share the same nucleotide that is different from the nucleotide observed in the fourth taxa (i.e., AAAC, AAAT, AAAG, CCCA, CCCT, CCCG, ..., etc.). Thus, the sum of the 15 integers in a row is equal to the length of the alignment.
The full table can be stored locally if the optional argument writecsv=true
was used when using readPhylip(args)
. The full table can be also printed on screen using the function show(df, allrows=true)
after loading the julia
package CSV
.
Next: Checkpointing