Adding Covariates - declan93/PGS-LMM GitHub Wiki
Adding new covariates
To add a new covariate to the analysis the covariate generating scripts must be altered. The lines to change depend on the type of covariate being added - i.e. discrete or continuous (note discrete variables with large factors should be treated as continuous covariates). makeCovar.sh
and makePgsCovars.sh
rely on awk to match the sample ids between two files (1,2) and append column 2 of file 1 onto file 2 creating file3. This process can be repeated for extra covariates. If we define an extra continuous covariate NEWCOVAR in the config.txt file we need to alter makeCovar.sh
and makePgsCovar.sh
as in the examples below
Original makeCovar.sh
#!/bin/bash
source "${PWD}/config.txt"
#continuous covariates
# hash age file using ID as key and covar as val. print evectors with age covar appended.
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $age ${GT}/pca.eigenvec > ${traits}/qcovars2.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $batch ${traits}/qcovars2.txt > ${traits}/qcovars.txt
var="FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 age batch"
sed -i "1s/.*/$var/" ${traits}/qcovars.txt
# discrete covariates
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $centre <(awk '{print $1, $1}' ${traits}/qcovars.txt) > ${traits}/fixed2.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' $sex ${traits}/fixed2.txt > ${traits}/fixed.txt
rm ${traits}/fixed2.txt
rm ${traits}/qcovars2.txt
var2="FID IID centre sex"
sed -i "1s/.*/$var2/" ${traits}/fixed.txt
Including NEWCOVAR in makeCovar.sh
#!/bin/bash
source "${PWD}/config.txt"
#continuous covariates
# hash age file using ID as key and covar as val. print evectors with age covar appended.
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $age ${GT}/pca.eigenvec > ${traits}/qcovars2.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $batch ${traits}/qcovars2.txt > ${traits}/qcovars3.tx
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $NEWCOVAR ${traits}/qcovars3.txt > ${traits}/qcovars.txt
var="FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 age batch NEWCOVAR"
sed -i "1s/.*/$var/" ${traits}/qcovars.txt
#discrete covariates
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $centre <(awk '{print $1, $1}' ${traits}/qcovars.txt) > ${traits}/fixed2.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' $sex ${traits}/fixed2.txt > ${traits}/fixed.txt
rm ${traits}/fixed2.txt
rm ${traits}/qcovars2.txt
rm ${traits}/qcovars3.txt
var2="FID IID centre sex"
sed -i "1s/.*/$var2/" ${traits}/fixed.txt
Similarly for addition of a continuous covariate we will also need to alter makePgsCovars.sh
. note that discrete covariates do not need to update makeCovar.sh
as the same discrete input file is used as in the original GWAS.
source "${PWD}/config.txt"
# hash age file using ID as key and covar as val. print evectors with age covar appended.
for i in {1..22}; do
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' $age ${GT}/pca.eigenvec > ${traits}/qcovars2_${i}.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $batch ${traits}/qcovars2_${i}.txt > ${traits}/qcovars3_${i}.txt
#awk 'NR==FNR{a[$1]=$5;next}{print $0,a[$1] ? a[$1] : "NA"}' ${PGS}/${i}.sscore ${traits}/qcovars3_${i}.txt > ${traits}/qcovars_${i} #SBayesfixed effect. If you run sbayesr uncomment this line and comment the next one.
awk 'NR==FNR{a[$1]=$3;next}{print $0,a[$1] ? a[$1] : "NA"}' ${PGS}/PGS.${i}.all.score ${traits}/qcovars3_${i}.txt > ${traits}/qcovars_${i} # This line include PRSice LOCO-PGS
var="FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 age batch PGS"
sed -i "1s/.*/$var/" ${traits}/qcovars_${i}
rm ${traits}/qcovars2_${i}.txt
rm ${traits}/qcovars3_${i}.txt
done
Which will be altered to
source "${PWD}/config.txt"
# hash age file using ID as key and covar as val. print evectors with age covar appended.
for i in {1..22}; do
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' $age ${GT}/pca.eigenvec > ${traits}/qcovars2_${i}.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $batch ${traits}/qcovars2_${i}.txt > ${traits}/qcovars3_${i}.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $NEWCOVAR ${traits}/qcovars3_${i}.txt > ${traits}/qcovars4_${i}.txt
#awk 'NR==FNR{a[$1]=$5;next}{print $0,a[$1] ? a[$1] : "NA"}' ${PGS}/${i}.sscore ${traits}/qcovars4_${i}.txt > ${traits}/qcovars_${i} #SBayesfixed effect. If you run sbayesr uncomment this line and comment the next one.
awk 'NR==FNR{a[$1]=$3;next}{print $0,a[$1] ? a[$1] : "NA"}' ${PGS}/PGS.${i}.all.score ${traits}/qcovars4_${i}.txt > ${traits}/qcovars_${i} # This line include PRSice LOCO-PGS
var="FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 age batch NECOVAR PGS"
sed -i "1s/.*/$var/" ${traits}/qcovars_${i}
rm ${traits}/qcovars2_${i}.txt
rm ${traits}/qcovars3_${i}.txt
rm ${traits}/qcovars4_${i}.txt
done