observed site patterns - sungsik-kong/PhyNEST.jl GitHub Wiki

Observed quartet site pattern frequencies

The observed quartet site pattern frequencies can be presented in a tabular format using the function show_sp(arg). The one and only mandatory arg is the object created using readPhylip(args). Using the name we assigned to the data object in the previous page, phylip_data, show_sp(arg) can be executed using the command:

julia> df=show_sp(phylip_data)

The output is n x m DataFrame object where n = (k choose 4) x 24 and m = 19. k represents the number of sequences. m is composed of four columns that represents each taxon in a quartet plus 15 possible site patterns in a quartet. Below shows the output of df=show_sp(phylip_data) that contains all quartet site pattern frequencies parsed from sample_n5h1.phy:

Click here to see the output
julia> df=show_sp(phylipdata)
120×19 DataFrame
 Row │ i          j          k          l          AAAA    AAAB    AABA    AABB   AABC   ABAA    ABAB   ABAC   ABBA   BAAA    ABBC   CABC   BACA   BCAA   ABCD
     │ Any        Any        Any        Any        Int64   Int64   Int64   Int64  Int64  Int64   Int64  Int64  Int64  Int64   Int64  Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ outgroup   species_4  species_3  species_1  731034   29756   29697  18168   1757   45126   2573   1399   2611  122101   3466   1405   3447   7203    257
   2 │ outgroup   species_4  species_3  species_2  736589   24201   24366  23742   1514   45895   2022   1181   2064  123279   2835   1144   2828   8129    211
   3 │ outgroup   species_4  species_1  species_2  736477   24254   24478  23703   1500   45938   2069   1135   2069  123249   2803   1096   2811   8209    209
   4 │ outgroup   species_3  species_1  species_2  757451   18709   25033   8005    690   24964   8057    692   1991  143031   2450    619   4062   4125    121
   5 │ species_4  species_3  species_1  species_2  832577   20558   27291   7836    706   27149   7852    716   1427   67905   1165    640   2009   2109     60
   6 │ outgroup   species_4  species_1  species_3  731034   29697   29756  18168   1757   45126   2611   1405   2573  122101   3447   1399   3466   7203    257
   7 │ outgroup   species_3  species_4  species_1  731034   29756   45126   2573   1399   29697  18168   1757   2611  122101   3466   1405   7203   3447    257
   8 │ outgroup   species_3  species_1  species_4  731034   45126   29756   2573   1399   29697   2611   1405  18168  122101   7203   1757   3466   3447    257
   9 │ outgroup   species_1  species_4  species_3  731034   29697   45126   2611   1405   29756  18168   1757   2573  122101   3447   1399   7203   3466    257
  10 │ outgroup   species_1  species_3  species_4  731034   45126   29697   2611   1405   29756   2573   1399  18168  122101   7203   1757   3447   3466    257
  11 │ species_4  outgroup   species_3  species_1  731034   29756   29697  18168   1757  122101   2611   3466   2573   45126   1399   3447   1405   7203    257
  12 │ species_4  outgroup   species_1  species_3  731034   29697   29756  18168   1757  122101   2573   3447   2611   45126   1405   3466   1399   7203    257
  13 │ species_4  species_3  outgroup   species_1  731034   29756  122101   2611   3466   29697  18168   1757   2573   45126   1399   3447   7203   1405    257
  14 │ species_4  species_3  species_1  outgroup   731034  122101   29756   2611   3466   29697   2573   3447  18168   45126   7203   1757   1399   1405    257
  15 │ species_4  species_1  outgroup   species_3  731034   29697  122101   2573   3447   29756  18168   1757   2611   45126   1405   3466   7203   1399    257
  16 │ species_4  species_1  species_3  outgroup   731034  122101   29697   2573   3447   29756   2611   3466  18168   45126   7203   1757   1405   1399    257
  17 │ species_3  outgroup   species_4  species_1  731034   29756   45126   2573   1399  122101   2611   3466  18168   29697   1757   7203   1405   3447    257
  18 │ species_3  outgroup   species_1  species_4  731034   45126   29756   2573   1399  122101  18168   7203   2611   29697   1405   3466   1757   3447    257
  19 │ species_3  species_4  outgroup   species_1  731034   29756  122101   2611   3466   45126   2573   1399  18168   29697   1757   7203   3447   1405    257
  ⋮  │     ⋮          ⋮          ⋮          ⋮        ⋮       ⋮       ⋮       ⋮      ⋮      ⋮       ⋮      ⋮      ⋮      ⋮       ⋮      ⋮      ⋮      ⋮      ⋮
 103 │ species_3  species_4  species_1  species_2  832577   20558   27291   7836    706   67905   1427   1165   7852   27149    716   2009    640   2109     60
 104 │ species_3  species_4  species_2  species_1  832577   27291   20558   7836    706   67905   7852   2009   1427   27149    640   1165    716   2109     60
 105 │ species_3  species_1  species_4  species_2  832577   20558   67905   1427   1165   27291   7836    706   7852   27149    716   2009   2109    640     60
 106 │ species_3  species_1  species_2  species_4  832577   67905   20558   1427   1165   27291   7852   2009   7836   27149   2109    706    716    640     60
 107 │ species_3  species_2  species_4  species_1  832577   27291   67905   7852   2009   20558   7836    706   1427   27149    640   1165   2109    716     60
 108 │ species_3  species_2  species_1  species_4  832577   67905   27291   7852   2009   20558   1427   1165   7836   27149   2109    706    640    716     60
 109 │ species_1  species_4  species_3  species_2  832577   20558   27149   7852    716   67905   1427   1165   7836   27291    706   2109    640   2009     60
 110 │ species_1  species_4  species_2  species_3  832577   27149   20558   7852    716   67905   7836   2109   1427   27291    640   1165    706   2009     60
 111 │ species_1  species_3  species_4  species_2  832577   20558   67905   1427   1165   27149   7852    716   7836   27291    706   2109   2009    640     60
 112 │ species_1  species_3  species_2  species_4  832577   67905   20558   1427   1165   27149   7836   2109   7852   27291   2009    716    706    640     60
 113 │ species_1  species_2  species_4  species_3  832577   27149   67905   7836   2109   20558   7852    716   1427   27291    640   1165   2009    706     60
 114 │ species_1  species_2  species_3  species_4  832577   67905   27149   7836   2109   20558   1427   1165   7852   27291   2009    716    640    706     60
 115 │ species_2  species_4  species_3  species_1  832577   27291   27149   1427    640   67905   7852   2009   7836   20558    706   2109    716   1165     60
 116 │ species_2  species_4  species_1  species_3  832577   27149   27291   1427    640   67905   7836   2109   7852   20558    716   2009    706   1165     60
 117 │ species_2  species_3  species_4  species_1  832577   27291   67905   7852   2009   27149   1427    640   7836   20558    706   2109   1165    716     60
 118 │ species_2  species_3  species_1  species_4  832577   67905   27291   7852   2009   27149   7836   2109   1427   20558   1165    640    706    716     60
 119 │ species_2  species_1  species_4  species_3  832577   27149   67905   7836   2109   27291   1427    640   7852   20558    716   2009   1165    706     60
 120 │ species_2  species_1  species_3  species_4  832577   67905   27149   7836   2109   27291   7852   2009   1427   20558   1165    640    716    706     60
                                                                                                                                                 83 rows omitted

Typically, n is large and show_sp(arg) will print the truncated table. The truncated table is often sufficient when the goal is to quickly check if readPhylip(args) has parsed the alignment properly. However, the full table may be useful if a user plans to conduct other analysis using the observed quartet site pattern frequencies like Patterson's D-statistic, for instance.

Each site pattern has the heading (i.e., AAAA, AAAB, AABA ...). At a given site in the alignment, any one of the four nucleotide states can be assigned to a taxon. This results in 256 possible quartet site patterns, which can be generalized into 15 patterns under the Jukes-Cantor model as shown in Chifman and Kubatko (2015).

For example, the integer under the column AAAA represents the number of sites where the four taxa in the quartet had identical nucleotide states (i.e., AAAA, CCCC, GGGG, or TTTT). Similarly, the integer under the column AAAB represents the number of sites where the first three taxa in the quartet share the same nucleotide that is different from the nucleotide observed in the fourth taxa (i.e., AAAC, AAAT, AAAG, CCCA, CCCT, CCCG, ..., etc.). Thus, the sum of the 15 integers in a row is equal to the length of the alignment.

The full table can be stored locally if the optional argument writecsv=true was used when using readPhylip(args). The full table can be also printed on screen using the function show(df, allrows=true) after loading the julia package CSV.

Next: Checkpointing

⚠️ **GitHub.com Fallback** ⚠️