1. POP‐GWAS - qlu-lab/POP-TOOLS GitHub Wiki
POP‐GWAS takes three GWAS summary statistics as input to conduct valid and powerful ML-assisted GWAS. It can be used for both quantitative and binary phenotype.
TL;DR
You can use POP-GWAS to perform ML-assisted GWAS on a quantitative phenotype (using head bone mineral density as an example, attached in the test folder) by
cd POP-TOOLS
trait=Head_BMD
python3 ./POP-GWAS.py \
--gwas-yhat-unlab ./test/data/${trait}_yhat_unlab.txt.gz \
--gwas-y-lab ./test/data/${trait}_y_lab.txt.gz \
--gwas-yhat-lab ./test/data/${trait}_yhat_lab.txt.gz \
--out ./test/results/${trait}
Here, yhat represents the imputed phenotype, y represents the observed phenotype, lab represents the labeled dataset, and unlab represents the unlabeled dataset. The combination of these is the three required GWAS:
--gwas-yhat-unlab: GWAS on imputed phenotype in unlabeled data--gwas-y-lab: GWAS on observed phenotype in labeled data--gwas-yhat-lab: GWAS on imputed phenotype in labeled data
The outputs is the result for POP-GWAS:
head ./test/result/Head_BMD_POP-GWAS.txt
CHR BP SNP A1 A2 EAF BETA SE Z P N_eff
22 16495833 rs79847867 A C 0.07798 0.01157 0.01283 0.901 3.673e-01 42228
22 16496170 rs560288282 A G 0.07798 0.01157 0.01283 0.901 3.673e-01 42228
22 16870108 rs131528 T C 0.31422 -0.00305 0.00742 -0.411 6.809e-01 42179
22 16870162 rs131529 A G 0.31429 -0.00278 0.00742 -0.375 7.078e-01 42184
22 16870214 rs131530 A G 0.31426 -0.00289 0.00742 -0.389 6.972e-01 42182
There are several things to note:
- The required input data format for
POP-GWAScan be found in this page. - SNPs in chromosome 22 are included in the example data for demonstration purposes. However, please use the full GWAS summary statistics containing SNPs in chr 1-22 as input, if you want use POP-GWAS to estimate the r (phenotypic correlation). In our test, it takes only about 3 minutes to produce results for a GWAS with 10 million SNPs.
- The interpretation of
BETAin the POP-GWAS summary statistics is the increase per allele in standard deviation units of phenotype. The SE is on the same scale as theBETA. N_effis the effective sample size of the ML-assisted GWAS.- We recommend to apply the sample overlap correction in POP-GWAS, if there are overlapping samples or residual correlations between input GWAS in labeled and unlabeled data. Such residual correlations can be quantified by the intercept of bivariate LD score regression.
Useful examples
Here we provide a few useful examples:
Binary phenotype
You can apply POP-GWAS to binary phenotypes by simply adding --bt to the script for the quantitative phenotype. Below is the script that applies POP-GWAS to a binary phenotype, using type-2 diabetes as an example.
cd POP-TOOLS
trait=T2D
python3 ./POP-GWAS.py \
--gwas-yhat-unlab ./test/data/${trait}_yhat_unlab.txt.gz \
--gwas-y-lab ./test/data/${trait}_y_lab.txt.gz \
--gwas-yhat-lab ./test/data/${trait}_yhat_lab.txt.gz \
--bt \
--out ./test/results/${trait}
The outputs is the result for POP-GWAS:
head ./test/result/T2D_POP-GWAS.txt
CHR BP SNP A1 A2 EAF OR SE Z P N_eff N_eff_case N_eff_control
22 16495833 rs79847867 A C 0.07836 0.01304 0.16857 0.077 9.383e-01 136416 6032 130384
22 16496170 rs560288282 A G 0.07836 0.01304 0.16857 0.077 9.383e-01 136416 6032 130384
22 16870108 rs131528 T C 0.31340 0.04334 0.09778 0.443 6.576e-01 136269 6021 130248
22 16870162 rs131529 A G 0.31346 0.04984 0.09771 0.510 6.100e-01 136284 6025 130259
22 16870214 rs131530 A G 0.31342 0.04828 0.09770 0.494 6.212e-01 136287 6026 130260
There are a few things to note:
- We require the
--gwas-y-labis on binary traits when using--bt. Otherwise, please use thePOP-GWASfor quantitative phenotype. - The format for the other input summary statistics depends on whether the imputed phenotype is quantitative or binary. Use format for quantitative phenotype if it is quantitative and format of binary phenotype if it is binary.
Available flags
The available flags for POP-GWAS to conduct ML-assisted GWAS are
python3 ./POP-GWAS.py \
--gwas-yhat-unlab <Path to GWAS summary statistics file of imputed phenotype in unlabeled data> \
--gwas-y-lab <Path to GWAS summary statistics file of observed phenotype in labeled data> \
--gwas-yhat-lab <Path to GWAS summary statistics file of imputed phenotype in labeled data> \
--out <The prefix of path to output summary statistics> \
# The following flags are optional.
--bt <Whether the phenotype is binary or not>
where the flags in order are
--gwas-yhat-unlab(required): Full path to the GWAS summary statistics on imputed phenotype in unlabeled data in the required format--gwas-y-lab(required): Full path to the GWAS summary statistics on observed phenotype in labeled data in the required format--gwas-yhat-lab(required): Full path to the GWAS summary statistics observed phenotype in unlabeled data in the required format--out(required): The prefix of the path to output the summary statistics. The output contains the text file<Prefix to the output file>_POP-GWAS.txtfor POP-GWAS summary statistics and<Prefix to the output file>_POP-GWAS.logfor debugging.--bt(optional): indication of whether the phenotype is binary or not.