Adding Covariates - declan93/PGS-LMM GitHub Wiki

Adding new covariates

To add a new covariate to the analysis the covariate generating scripts must be altered. The lines to change depend on the type of covariate being added - i.e. discrete or continuous (note discrete variables with large factors should be treated as continuous covariates). makeCovar.sh and makePgsCovars.sh rely on awk to match the sample ids between two files (1,2) and append column 2 of file 1 onto file 2 creating file3. This process can be repeated for extra covariates. If we define an extra continuous covariate NEWCOVAR in the config.txt file we need to alter makeCovar.sh and makePgsCovar.sh as in the examples below

Original makeCovar.sh

#!/bin/bash
source "${PWD}/config.txt"

#continuous covariates
# hash age file using ID as key and covar as val. print evectors with age covar appended. 
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $age ${GT}/pca.eigenvec > ${traits}/qcovars2.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $batch ${traits}/qcovars2.txt > ${traits}/qcovars.txt

var="FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 age batch"
sed -i "1s/.*/$var/" ${traits}/qcovars.txt

# discrete covariates
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $centre <(awk '{print $1, $1}' ${traits}/qcovars.txt) > ${traits}/fixed2.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' $sex ${traits}/fixed2.txt > ${traits}/fixed.txt
rm ${traits}/fixed2.txt
rm ${traits}/qcovars2.txt
var2="FID IID centre sex"
sed -i "1s/.*/$var2/" ${traits}/fixed.txt 

Including NEWCOVAR in makeCovar.sh

#!/bin/bash
source "${PWD}/config.txt"

#continuous covariates
# hash age file using ID as key and covar as val. print evectors with age covar appended. 
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $age ${GT}/pca.eigenvec > ${traits}/qcovars2.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $batch ${traits}/qcovars2.txt > ${traits}/qcovars3.tx
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $NEWCOVAR ${traits}/qcovars3.txt > ${traits}/qcovars.txt

var="FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 age batch NEWCOVAR"
sed -i "1s/.*/$var/" ${traits}/qcovars.txt


#discrete covariates
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $centre <(awk '{print $1, $1}' ${traits}/qcovars.txt) > ${traits}/fixed2.txt
awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' $sex ${traits}/fixed2.txt > ${traits}/fixed.txt
rm ${traits}/fixed2.txt
rm ${traits}/qcovars2.txt
rm ${traits}/qcovars3.txt
var2="FID IID centre sex"
sed -i "1s/.*/$var2/" ${traits}/fixed.txt 

Similarly for addition of a continuous covariate we will also need to alter makePgsCovars.sh. note that discrete covariates do not need to update makeCovar.sh as the same discrete input file is used as in the original GWAS.

source "${PWD}/config.txt"

# hash age file using ID as key and covar as val. print evectors with age covar appended. 
for i in {1..22}; do
	awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' $age ${GT}/pca.eigenvec > ${traits}/qcovars2_${i}.txt
	awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $batch ${traits}/qcovars2_${i}.txt > ${traits}/qcovars3_${i}.txt
	#awk 'NR==FNR{a[$1]=$5;next}{print $0,a[$1] ? a[$1] : "NA"}' ${PGS}/${i}.sscore ${traits}/qcovars3_${i}.txt > ${traits}/qcovars_${i}   #SBayesfixed effect. If you run sbayesr uncomment this line and comment the next one. 
	awk 'NR==FNR{a[$1]=$3;next}{print $0,a[$1] ? a[$1] : "NA"}' ${PGS}/PGS.${i}.all.score ${traits}/qcovars3_${i}.txt > ${traits}/qcovars_${i}  # This line include PRSice LOCO-PGS
	var="FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 age batch PGS"
	sed -i "1s/.*/$var/" ${traits}/qcovars_${i}
	rm ${traits}/qcovars2_${i}.txt
	rm ${traits}/qcovars3_${i}.txt
done

Which will be altered to

source "${PWD}/config.txt"

# hash age file using ID as key and covar as val. print evectors with age covar appended. 
for i in {1..22}; do
	awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' $age ${GT}/pca.eigenvec > ${traits}/qcovars2_${i}.txt
	awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $batch ${traits}/qcovars2_${i}.txt > ${traits}/qcovars3_${i}.txt
        awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1] ? a[$1] : "NA"}' $NEWCOVAR ${traits}/qcovars3_${i}.txt > ${traits}/qcovars4_${i}.txt
	#awk 'NR==FNR{a[$1]=$5;next}{print $0,a[$1] ? a[$1] : "NA"}' ${PGS}/${i}.sscore ${traits}/qcovars4_${i}.txt > ${traits}/qcovars_${i}   #SBayesfixed effect. If you run sbayesr uncomment this line and comment the next one. 
	awk 'NR==FNR{a[$1]=$3;next}{print $0,a[$1] ? a[$1] : "NA"}' ${PGS}/PGS.${i}.all.score ${traits}/qcovars4_${i}.txt > ${traits}/qcovars_${i}  # This line include PRSice LOCO-PGS
	var="FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 age batch NECOVAR PGS"
	sed -i "1s/.*/$var/" ${traits}/qcovars_${i}
	rm ${traits}/qcovars2_${i}.txt
	rm ${traits}/qcovars3_${i}.txt
        rm ${traits}/qcovars4_${i}.txt
done