1. Basic workflow - FinucaneLab/fine-mapping-inf GitHub Wiki

A sample script that runs both SuSiE-inf and FINEMAP-inf looks like this

python3.9 run_fine_mapping.py \
    --sumstats Height.chr1.0-1000.z \
    --beta-col-name beta \
    --se-col-name se \
    --ld-file chr1.0-1000.ld.bgz \
    --n 100000 \
    --save-npz \
    --save-tsv \
    --eigen-decomp-prefix chr1.0-1000 \
    --output-prefix  Height.chr1.0-1000

In general, to run fine-mapping-inf, four categories of information are needed:

  1. Summary statistics related information
  2. LD related information
  3. Algorithm parameters
  4. Output parameters

1. Summary statistics

In the above example,

--sumstats Height.chr1.0-1000.z \
--beta-col-name beta \
--se-col-name se

specifies the summary statistics file, column name for marginal beta, and column name for marginal standard error. If z scores are calculated and is one of the columns in the summary statistics file, --beta-col-name and --se-col-name can be replaced by specifying the column name of z scores:

--z-col-name

2. LD information

In the above example,

--ld-file

specifies the file contains LD matrix.

Our method requires eigenvectors and eigenvalues of XtX to achieve computational efficiency. If --ld-file is given, we will perform eigen decomposition of the LD matrix using scipy.linalg.eigh. If user already pre-computed eigenvectors and eigenvalues of XtX (note, this is n*LD for standardized genotype matrix X), --ld-file can be can be replaced by

--V-file \
--Dsq-file 

3. Algorithm parameters

In the above example,

--n 100000 \

provides parameters for the algorithms.

It is required that sample size is provided using --n. In the above example, sample size is 100K. You can specify either --method susieinf or --method finemapinf. We recommend the default option where SuSiE-inf and FINEMAP-inf with both run, FINEMAP-inf will use the tau-squared and sigma-squared generated by SuSiE-inf to save runtime and improve accuracy. If you want FINEMAP-inf to estimate tau-squared separately by itself you can provide the flag --est-finemapinf-tausq.

We also provide the option to specify the maximum number of sparse effects --num-sparse-effects, prior causal probabilities --prior, credible set coverage (SuSiE-inf) --coverage, etc.

4. Output parameters

In the above example,

--save-npz \
--save-tsv \
--eigen-decomp-prefix chr1.0-1000 \
--output-prefix  Height.chr1.0-1000

provides output parameters.

It is required that --output-prefix is specified. This will be the file prefix for the log file, as well as files that store fine-mapping results. One or both of --save-npz and --save-tsv needs to be provided in order to save fine-mapping results to files. If --save-npz is provided, fine-mapping results will be saved as a dictionary to a .npz file. If --save-tsv is provided, fine-mapping results will be saved as a .tsv file. Both .npz and .tsv files contain the same output, just in different formats. The .npz format is usually more space efficient to store, on the other hand, the .tsv format is easier to inspect.

If you have provided an LD matrix and wanted to save the results of eigen decomposition to file, please provide file prefix via --eigen-decomp-prefix. It is separate from --output-prefix because LD is specific to the region and user may wish to label the files with only region name and not (e.g.) phenotype name, which is needed to label most fine-mapping result files.