00 Converting to plink format - WheelerLab/gwasqc_pipeline GitHub Wiki
The obvious first step of this pipeline is to convert your data to a format supported by plink. Genotype data comes in many forms. While there are a variety of tools available both publicly and privately (your own lab mates may have several custom scripts), it is not trivial to make a tool capable of converting between every format. While several scripts are provided in this pipeline to carry out this process, it is very unlikely that these will be all-encompassing. I will walk through a few specific examples and provide scripts to carry out this conversion, but I suggest that users browse their resources including the plink documentation on supported formats and forums such as biostars.
Example: Illumina format to lgen/map to ped/map to bed/bim/fam
When this pipeline was first instantiated, one of the first data sets it encountered was an illumina format. In the Rscripts
folder of this pipeline I have included a basic illumina converter. This creates the plink supported lgen and map files which can be further transformed using plinks --make-bed
and --lfile
options. A basic description of how to run these is below, with a more complete description of illumina_to_lgen.R
in the Rscripts section of this wiki.
Example Run
#convert illumina to lgen
Rscript illumina_to_lgen.R --illumina FinalReport.txt --out example_out_prefix --skip 9
#convert lgen to ped
plink --lfile example_out_prefix --out new_example_out --recode
#convert ped and map to bed/bim/fam
plink --file new_example_out --make-bed --out final_example_out
Example: Dosage format to bed/bim/fam
Dosage format as in the input of predixcan. Requires plink2. You must also specify a sample information file with --fam/--psam, which can be easily spoofed.
awk -F'\t' '{print $1,$2,0,0,0,0}' samples.txt > spoofed.fam
plink --import-dosage example.dosage.txt.gz skip0=1 skip1=1 chr-col-num=1 pos-col-num=3 noheader format=1 --make-bed --out example_prefix_out --psam spoofed.fam