Build a reference panel - zhenin/HDL GitHub Wiki
Requirements
OS
Linux or OSX
Download the repository
git clone https://github.com/zhenin/HDL.git
Compile Fortran functions
cd HDL
rm -f build_ld_ref/utils/bmult.o build_ld_ref/utils/ldscore.o \
build_ld_ref/utils/bmult.so build_ld_ref/utils/ldscore.so
R CMD SHLIB build_ld_ref/utils/bmult.f90
R CMD SHLIB build_ld_ref/utils/ldscore.f90
Install R packages
install.packages(c('tidyr', 'dplyr', 'data.table', 'RSpectra', 'argparser'))
Install HDL (required for demo example)
Rscript HDL.install.R
Guide
Demo
Plink files of the demo example are generated from the data of 1000 Genomes Project.
bash build_ld_ref/run_demo.sh
Step 1. Split chromosomes
CAUTION :
.bimfiles of ALL chromosomes, of the LD reference panel, must be merged (cat) into a SINGLE.bimfile.- Variant identifiers (rsids) in the
.bimfile MUST BE UNIQUE.
Rscript build_ld_ref/1_split_chroms.R <ld_ref_path/ld_ref_name> <ALL_SNPS.bim> --min MIN_AVG_NUM_SNPs --max MAX_AVG_NUM_SNPs
--min and --max options control the range of average number of variants in a segment.
Step 2. Calculate LD
Prepare plink data: bfile.bed + bfile.bim + bfile.fam.
bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> [bandwidth [ld_window [chroms]]] | bash
Optional arguments:
bandwidth: bandwidth (number of SNPs) for LD calculation, default=500.
ld_window: window size (kb) for LD calculation, default=1000000 (whole segment).
chroms: selected chromosomes, separated by comma (,), default=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22.
- Or run parallelly using
parallelcommand
bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> | parallel -j n_cores
- Or run parallelly by saving commands to a file, then splitting & submitting it to your server cluster accordingly
bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> > jobs.sh
Step 3. Build LD reference
bash build_ld_ref/3_build_ld_ref.sh <ld_ref_path/ld_ref_name> | bash
Or run parallelly as Step 2.