dstat - sungsik-kong/PhyNEST.jl GitHub Wiki
Patterson's D-statistic
Patterson's D-statistic, also known as the ABBA-BABA test, is a commonly used statistical test to detect introgression. In short, it considers ancestral (A
) and derived (B
) alleles across the genomes of four taxa. Two particular allelic patterns ABBA
and BABA
(or ABAB
depending on the order of the four taxa) should occur equally frequent in the absence of introgression. Significantly different frequencies of the two patterns represents the presence of introgression, assuming equal substitution rates and unlinked loci. See Green et al. (2010) and Durand et al. (2011) for more information.
In PhyNEST
, the D-statistic can be executed using the function Dstat
. The mandatory input arguments are the Phylip
object that contains the site pattern frequency information of the alignment parsed using the function readPhylip
and the name of the outgroup taxa as appears in the sequence alignment. By default (the optional argument display_all=false
), Dstat
will only show significant tests; whereas by setting display_all=true
, Dstat
will display the results for every combination of four taxa in the alignment.
julia> p=readPhylip("sample_n5h1.txt")
Progress:
0+---------------+100%
***************complete
Summary of Phylip File
Parsing the file [sample_n5h1.txt] took 27.512 seconds.
Number of taxa: 5
Species names: ["5", "4", "3", "1", "2"]
Alignment length (b.p): 1000000
Site patterns frequencies for 120 quartets computed and stored.
Try `show_sp()` function to see all quartet site patterns.
julia> Dstat(p,"5")
Tip: if neccessary, use function showallDF(df) to see all the rows.
8×10 DataFrame
Row │ outgroup taxa1 taxa2 taxa3 ABAB ABBA Dstat Zscore Pvalue significance
│ String String String String Int64 Int64 Float64 Float64 Float64 String
─────┼───────────────────────────────────────────────────────────────────────────────────────────
1 │ 5 3 1 4 2611 18168 0.748689 107.923 0.0 *
2 │ 5 1 3 4 2573 18168 0.751892 108.286 0.0 *
3 │ 5 3 2 4 2064 23742 0.840037 134.946 0.0 *
4 │ 5 2 3 4 2022 23742 0.843037 135.317 0.0 *
5 │ 5 1 2 4 2069 23703 0.839438 134.761 0.0 *
6 │ 5 2 1 4 2069 23703 0.839438 134.761 0.0 *
7 │ 5 3 2 1 1991 8057 0.603702 60.5149 0.0 *
8 │ 5 1 2 3 1991 8005 0.601641 60.152 0.0 *
julia> Dstat(p,"5",display_all=true)
Tip: if neccessary, use function showallDF(df) to see all the rows.
24×10 DataFrame
Row │ outgroup taxa1 taxa2 taxa3 ABAB ABBA Dstat Zscore Pvalue significance
│ String String String String Int64 Int64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 5 4 3 1 2573 2611 0.00733025 0.527778 0.298827
2 │ 5 4 3 2 2022 2064 0.010279 0.657053 0.255574
3 │ 5 4 1 2 2069 2069 0.0 0.0 0.5
4 │ 5 3 1 2 8057 1991 -0.603702 -60.5149 1.0
5 │ 5 4 1 3 2611 2573 -0.00733025 -0.527778 0.701173
6 │ 5 3 4 1 18168 2611 -0.748689 -107.923 1.0
7 │ 5 3 1 4 2611 18168 0.748689 107.923 0.0 *
8 │ 5 1 4 3 18168 2573 -0.751892 -108.286 1.0
9 │ 5 1 3 4 2573 18168 0.751892 108.286 0.0 *
10 │ 5 4 2 3 2064 2022 -0.010279 -0.657053 0.744426
11 │ 5 3 4 2 23742 2064 -0.840037 -134.946 1.0
12 │ 5 3 2 4 2064 23742 0.840037 134.946 0.0 *
13 │ 5 2 4 3 23742 2022 -0.843037 -135.317 1.0
14 │ 5 2 3 4 2022 23742 0.843037 135.317 0.0 *
15 │ 5 4 2 1 2069 2069 0.0 0.0 0.5
16 │ 5 1 4 2 23703 2069 -0.839438 -134.761 1.0
17 │ 5 1 2 4 2069 23703 0.839438 134.761 0.0 *
18 │ 5 2 4 1 23703 2069 -0.839438 -134.761 1.0
19 │ 5 2 1 4 2069 23703 0.839438 134.761 0.0 *
20 │ 5 3 2 1 1991 8057 0.603702 60.5149 0.0 *
21 │ 5 1 3 2 8005 1991 -0.601641 -60.152 1.0
22 │ 5 1 2 3 1991 8005 0.601641 60.152 0.0 *
23 │ 5 2 3 1 8005 8057 0.00323745 0.410302 0.340792
24 │ 5 2 1 3 8057 8005 -0.00323745 -0.410302 0.659208
A data frame that has 10 columns is displayed at the end of each analysis. First four columns are the four taxa included in the test, followed by two columns that represents the frequencies of ABAB
and ABBA
site patterns for the four taxa. The subsequent three columns represents test results, in the order of D-statistic
, Z-scare
, and P-value
. Using the $\alpha$ level that is set as 0.05 by default (optional argument pval=0.05
), the significant test will have *
at the last column.
By setting the optional argument writecsv=true
, the results can be locally stored in a .csv
format. By default, writecsv=false
. This .csv
file will be named as Dstat-out.csv
by default, but can be modified by a user using the optional argument filename
.
julia> Dstat(p,"5",display_all=true,writecsv=true,filename="D-stat-test")
The results are stored as Dstat-out.csv in the working directory.
Tip: if neccessary, use function showallDF(df) to see all the rows.
24×10 DataFrame
Row │ outgroup taxa1 taxa2 taxa3 ABAB ABBA Dstat Zscore Pvalue significance
│ String String String String Int64 Int64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 5 4 3 1 2573 2611 0.00733025 0.527778 0.298827
2 │ 5 4 3 2 2022 2064 0.010279 0.657053 0.255574
3 │ 5 4 1 2 2069 2069 0.0 0.0 0.5
4 │ 5 3 1 2 8057 1991 -0.603702 -60.5149 1.0
5 │ 5 4 1 3 2611 2573 -0.00733025 -0.527778 0.701173
6 │ 5 3 4 1 18168 2611 -0.748689 -107.923 1.0
7 │ 5 3 1 4 2611 18168 0.748689 107.923 0.0 *
8 │ 5 1 4 3 18168 2573 -0.751892 -108.286 1.0
9 │ 5 1 3 4 2573 18168 0.751892 108.286 0.0 *
10 │ 5 4 2 3 2064 2022 -0.010279 -0.657053 0.744426
11 │ 5 3 4 2 23742 2064 -0.840037 -134.946 1.0
12 │ 5 3 2 4 2064 23742 0.840037 134.946 0.0 *
13 │ 5 2 4 3 23742 2022 -0.843037 -135.317 1.0
14 │ 5 2 3 4 2022 23742 0.843037 135.317 0.0 *
15 │ 5 4 2 1 2069 2069 0.0 0.0 0.5
16 │ 5 1 4 2 23703 2069 -0.839438 -134.761 1.0
17 │ 5 1 2 4 2069 23703 0.839438 134.761 0.0 *
18 │ 5 2 4 1 23703 2069 -0.839438 -134.761 1.0
19 │ 5 2 1 4 2069 23703 0.839438 134.761 0.0 *
20 │ 5 3 2 1 1991 8057 0.603702 60.5149 0.0 *
21 │ 5 1 3 2 8005 1991 -0.601641 -60.152 1.0
22 │ 5 1 2 3 1991 8005 0.601641 60.152 0.0 *
23 │ 5 2 3 1 8005 8057 0.00323745 0.410302 0.340792
24 │ 5 2 1 3 8057 8005 -0.00323745 -0.410302 0.659208```
shell> ls
D-stat-test.csv