MetaPC - gc5k/GEAR GitHub Wiki


Meta-PCA for the inference of population structure

MetaPC

This function uses reported allele frequencies to infer the genetic origin of cohorts.

Citation: Euro J Hum Genet, 2017, 25:137-146


Data format for summary statistics (see HapMap example below)

A summary statistic file should have these columns: "SNP", "CHR", "A1", "A2", and "RAF". Although the keywords are case-insensitive, and there is no requirement for the order of these columns. "A1" is the reference allele, and "A2" is the other allele. Other columns such as reference allele frequency, standard error of allele frequency can also be included.

SNP CHR A1 A2 RAF
snp1 1 G T 0.35
snp2 2 T A 0.03

NOTE: when using --key option the keywords are case-insensitive and should exactly match the field names in your data.

Ambiguous loci, such as A/T and G/C loci, will be eliminated automatically. In this example, the second row, which has ambiguous alleles, A/T, will be eliminated.


Master command: mpc

Options

--meta-batch

Specify a file that lists names of all summary data files to be used in a meta-analysis, one file per line:

gwas1.txt

gwas2.txt

...

--key

Although summary statistic files have the columns required, their names may be different. For the field names specified as the example above, the option is "--key SNP CHR A1 A2 MAF". The first five parameters passed into --key should be in the order for SNP, CHR, A1, A2, and MAF.

--chr

Specify the chromosome for analysis. Otherwise will use all autosomes.

--beta

When this option is switched on, "RAF" that specified in --key will be treated as genetic effect.

--keep-atgc

When this option is switched on, the palindromic loci are also used for analysis. However, it is not suggested otherwise you are confident that palindromic alleles are lined up to the same strand.


HapMap example

The HapMap example file can be downloaded here, and run the command below

java -jar gear.jar mpc --meta-batch meta-frq.txt --key SNP CHR A1 A2 MAF --out hapmap_22

Given 25 cohorts for their chromosome 22 only, the meta-PCA visualization looks like below -- the three major ethnicity cohorts are highlighted.

If the allele frequency is estimated by plink and the results are save in *.frq format, the command can be further simplified to

java -jar gear.jar mpc --meta-batch meta-frq.txt --out hapmap_22

HapMap_22

The results will be written into:

'*.msnp' is for SNPs chosen to generate meta-pca.

'*.crm' is for correlation matrix for generating eigenvalues and eigenvector.

'*.mval' is for eigenvalues.

'*.mvec' is for eigenvector.


The latest GEAR package can be found here.

Return to GEAR Home

⚠️ **GitHub.com Fallback** ⚠️