2. Input file formats - FinucaneLab/fine-mapping-inf GitHub Wiki

Sample summary statistics file

rsid         chromosome  position   allele1  allele2  maf       beta         se          p
rs139709627  01          152918258  A        C        0.005139  -0.00732174  0.029975    0.81
rs11576457   01          152918556  T        C        0.419853  6.43001e-05  0.00421589  0.99
rs11576882   01          152918785  T        G        0.492955  -0.00125987  0.00415113  0.76
rs55719571   01          152918837  G        A        0.038157  0.0054003    0.0108642   0.62
rs140228053  01          152920070  T        C        0.003884  0.00362918   0.0354218   0.92

The algorithm requires either a z score column or both marginal beta and se columns. The other columns are optional. SNP ID column is not necessary but highly recommended.

This file needs to be readable with pandas.read_csv.

Sample LD file

array([[ 1.        ,  0.0629755 ,  0.0726322 , -0.0172111 , -0.00607431],
       [ 0.0629755 ,  1.        ,  0.855799  ,  0.168526  , -0.0796643 ],
       [ 0.0726322 ,  0.855799  ,  1.        ,  0.197715  , -0.0679788 ],
       [-0.0172111 ,  0.168526  ,  0.197715  ,  1.        , -0.0150502 ],
       [-0.00607431, -0.0796643 , -0.0679788 , -0.0150502 ,  1.        ]])

This file can be a text file with compression .gz or .bgz, or in .npy or .npz format. It is recommended to use .npz format for space/reading/writing efficiency.

Sample V and Dsq files

Sample V file is similar to LD file

array([[-1.31506067e-04,  7.71340148e-05, -1.04195915e-04, 1.08580830e-04, -1.43979041e-04],
       [-2.97993987e-04, -8.78223493e-05, -2.47383738e-04, -4.95550013e-04,  3.78502259e-04],
       [-6.60906576e-05, -8.96847201e-06, -2.88163020e-04, 4.29199398e-04, -5.05586938e-05],
       [ 2.29689385e-05,  3.38290026e-04,  1.44517316e-04, 3.12355088e-05, -3.14269711e-05],
       [-1.93112073e-05, -1.06344138e-05, -4.22312755e-05, 1.23265418e-05,  3.01481916e-05]])

Sample Dsq file is a 1-D array

array([0.15874036e+00, ..., 3.78185262e+07,  3.84594498e+07,  4.81299259e+07])

It is recommended that both files are in .npz format.

Sample prior file

SNPVAR
1.846300e-08
1.846300e-08
3.213400e-08
1.846300e-08
1.846300e-08

User can specify a file with one column (with header) which stores the prior causal probabilities of SNPs.