Copy number determination using ImmunoChip intensity data for the FCGR locus - molgenis/systemsgenetics GitHub Wiki
Lude Franke, 2014 - 2015
Overview:
- Export the raw intensity data of the ImmunoChip genotype data using GenomeStudio. Choose to export a final report file where you output the X and Y (the intensity of the red and green channels). We use these to determine the overal intensity (i.e. the euclidian distance = sqrt(XX + YY)).
- Randomly select a subset of 10,000 SNPs, but exclude all SNPs within 5Mb from the FCGR locus.
- Generate a tab-delimited matrix where the columns indicate the different samples and the rows contain the randomly selected subset of SNPs. The cells contain the overall intensities.
- Conduct a PCA on this matrix, calculate the eigenvalues and eigenvectors.
- Only use the components that have an eigenvalue > 2. Take the eigenvectors for these components. These will be covariates that we will use to correct the intensities of the FCGR locus
- Now select only the SNPs within the FCGR locus.
- Generate a tab-delimited matrix where the columns indicate the different samples and the rows contain the randomly selected subset of SNPs. The cells contain the overall intensities.
- Correct the intensities of the FCGR SNPs for the covariates that we have identified in step 5.
- We now have the intensities for the SNPs within the FCGR locus and will summarise these intensities by conducting a PCA.
- We take the first two components. We make a scatterplot of these components determine cut-offs that correspond to the different FCGR copy numbers
Software code:
Steps 1 - 3: We do not provide code for performing steps 1 - 3, since this depends on how you have stored your raw intensity data.
Step 4 can be conducted fast and conveniently using the eQTL Mapping pipeline:
- Download and unzip the latest version of the eQTL mapping pipeline: http://www.molgenis.org/jenkins/job/systemsgenetics/nl.systemsgenetics$eqtl-mapping-pipeline/lastSuccessfulBuild/artifact/nl.systemsgenetics/eqtl-mapping-pipeline/1.3.3-SNAPSHOT/eqtl-mapping-pipeline-1.3.3-SNAPSHOT-dist.zip
- Use java -Xmx4g -jar eqtl-mapping-pipeline.jar --mode normalize --adjustPCA --in fileIn --out dirOut
- Here fileIn is the filename of the tab-delimited file that you created in step 3. dirOut refers to the directory where the PCA results are being stored.
Step 5: Use the filename that ends with eigenvalues.txt. It contains for each component the eigenvalues. Determine which of the components have an eigenvalue > 2. Now use the tab-delimited file that ends with eigenvectors.txt and extract the number of components with an eigenvalue > 2. Save this file and call it covariates.txt
Step 6 - 7: We do not provide code for performing steps 6, since this depends on how you have stored your raw intensity data.
Steps 8 can be conducted fast and conveniently using the eQTL Mapping pipeline: java -Xmx4g -jar eqtl-mapping-pipeline.jar --mode normalize --adjustcovariates --cov covariates.txt -in fileIn -out dirOut
- Here fileIn is the filename of the tab-delimited file that you created in step 6 - 7. dirOut refers to the directory where the PCA results are being stored. covariates.txt refers to the file that you generated in step 5.
Step 9: We have now a file that ends with covariatescorrected.txt. We will perform one final PCA on this data:
- Use java -Xmx4g -jar eqtl-mapping-pipeline.jar --mode normalize --adjustPCA --in fileIn --out dirOut
- Here fileIn is the filename of the FCGR intensity file that you created and which has been corrected for technical batch effects (step 8). dirOut refers to the directory where the PCA results are being stored.
Step 10: Open the final file that ends with eigenvectors.txt and plot the two first components in a scatterplot. This will yield the FCGR copy numbers.