2. Input file formats - FinucaneLab/fine-mapping-inf GitHub Wiki
Sample summary statistics file
rsid chromosome position allele1 allele2 maf beta se p
rs139709627 01 152918258 A C 0.005139 -0.00732174 0.029975 0.81
rs11576457 01 152918556 T C 0.419853 6.43001e-05 0.00421589 0.99
rs11576882 01 152918785 T G 0.492955 -0.00125987 0.00415113 0.76
rs55719571 01 152918837 G A 0.038157 0.0054003 0.0108642 0.62
rs140228053 01 152920070 T C 0.003884 0.00362918 0.0354218 0.92
The algorithm requires either a z score column or both marginal beta and se columns. The other columns are optional. SNP ID column is not necessary but highly recommended.
This file needs to be readable with pandas.read_csv
.
Sample LD file
array([[ 1. , 0.0629755 , 0.0726322 , -0.0172111 , -0.00607431],
[ 0.0629755 , 1. , 0.855799 , 0.168526 , -0.0796643 ],
[ 0.0726322 , 0.855799 , 1. , 0.197715 , -0.0679788 ],
[-0.0172111 , 0.168526 , 0.197715 , 1. , -0.0150502 ],
[-0.00607431, -0.0796643 , -0.0679788 , -0.0150502 , 1. ]])
This file can be a text file with compression .gz
or .bgz
, or in .npy
or .npz
format. It is recommended to use .npz
format for space/reading/writing efficiency.
Sample V and Dsq files
Sample V file is similar to LD file
array([[-1.31506067e-04, 7.71340148e-05, -1.04195915e-04, 1.08580830e-04, -1.43979041e-04],
[-2.97993987e-04, -8.78223493e-05, -2.47383738e-04, -4.95550013e-04, 3.78502259e-04],
[-6.60906576e-05, -8.96847201e-06, -2.88163020e-04, 4.29199398e-04, -5.05586938e-05],
[ 2.29689385e-05, 3.38290026e-04, 1.44517316e-04, 3.12355088e-05, -3.14269711e-05],
[-1.93112073e-05, -1.06344138e-05, -4.22312755e-05, 1.23265418e-05, 3.01481916e-05]])
Sample Dsq file is a 1-D array
array([0.15874036e+00, ..., 3.78185262e+07, 3.84594498e+07, 4.81299259e+07])
It is recommended that both files are in .npz
format.
Sample prior file
SNPVAR
1.846300e-08
1.846300e-08
3.213400e-08
1.846300e-08
1.846300e-08
User can specify a file with one column (with header) which stores the prior causal probabilities of SNPs.