Inputs - single-cell-genetics/limix_qtl GitHub Wiki

Input

Here we describe the files required for the QTL mapping Each file contains compulsory fields with compulsory field naming and optional fields (specified by [])

Feature annotation file

A tab separated text file

Column names:

 feature_id
 chromosome
 start
 end
 ensembl_gene_id*
 feature_strand*
 [gene_name]
 [superior_feature_id]

*optional for plotting

[] not currently used but taken along.

Example:
feature_id	chromosome	start	end	ensembl_gene_id	gene_name	feature_strand
H3BR00	16	28477974	28503333	ENSG00000261832	CLN3	-

Phenotype file

A tab separated text file

 The first column contains `feature_id`
 The first line contains `sample_id`

  Example:
  feature_id sample1 sample2 sample3 sample4 sample5
  H3BR00     83.8	   2198.4  2035.8  2678.2   5266.1

Genotype file:

  • Binary Plink files.

Using Genotype-Harmonizer a large number of genotyping formats can be converted into binary Plink, the output option is: -O PLINK_BED. See for more information the Genotype-Harmonizer documentation.

link file

Containing sample_id from the Genotype file and sample_id from the Phenotype file:

Covariate matrix file

A tab separated text file

 The first column contains `sample_id`
 The first line contains `covariates`

  Example:
  sample_id covariate1 covariate2 covariate3
  sample1   1	   218     0 
  sample2   1	   -32.4     1
  sample3   1	   0.4     1
  sample4   1	   28.4     0

Kinship matrix file

A tab separated text file

 The first column contains `sample_id`
 The first line contains `sample_id`

  Example:
  sample_id sample1 sample2 sample3 sample4
  sample1   1	0.2     0.002   -0.3 
  sample2   0.2	1.08    0.55    0.1
  sample3   0.002	0.55    1       0
  sample4   -0.3	0.1     0       1

Using Plink2 this can be easily calculated. Follow the steps below:

Start ideally with none imputed genotypes. (If not available do a stringent QC filter on call rate: "--inputProb 0.6" "-cr 1.0" Using genotype harmonizer to get the most HQ variants.) (NB. I took these steps from Plink but can be also done using for instance genotype harmonizer.)

Remove SNPs with a low MAF frequency, and are out of HWE /tools/plink2 --bfile {raw_genotype} --maf 0.05 --hwe 1e-6 --make-bed --out {raw_genotype_filtered}"

Prune variants (250 variants, window shift 50, indep at R2 0.2) plink2 --bfile {raw_genotype_filtered} --indep-pairwise 250 50 0.2 --bad-ld --out {out_pruning_info}"

Make king IBD matrix: plink2 --bfile {raw_genotype_filtered} --extract {out_pruning_info}.prune.in --make-king square --out {king_ibd_out}

After running this command, the output *.king and *.king.id can be made into the kinship matrix for QTL. Please make sure you multiply the king values by 2, to get in the normal 0-1 space. The kinship needs to have the king.id as row and column information.

Sample mapping file

A tab separated text file

Column names: genotype_individual_id phenotype_sample_id

  Example:

  name_genotype_sample1 namepehnotype_sample1
  name_genotype_sample2 namepehnotype_sample2.replica1
  name_genotype_sample2 namepehnotype_sample2.replica2
  name_genotype_sample3 namepehnotype_sample3

SNP / variante filter file

To filter down to specific set of variants you can use the 'variant_filter' option while running your analysis. The file that you should give in should have a header with the name: 'snp_id' and one variant / snp per row.

Feature filter file

To filter down to specific set of features you can use the 'feature_filter' option while running your analysis. The file that you should give in should have a header with the name: 'feature' and one feature id per row.

Combined feature variant filter

To filter down to specific combinations of SNPs and features you can use the 'feature_variant_filter' option while running your analysis. The file that you should give in should be a tab separated that starts with 'snp_id feature' and on the subsequent lines all the snp id (column one) and relevant feature ids (column two) to be tested together should speechified.

Feature specific covariates

To regress out SNP effect to increase power for trans or look to for secondary eQTLs one can give in a file specifying which SNPs to correct for for a specific feature. To do so one must use the 'feature_variant_covariate' flag. The file layout is the same as for the Combined feature variant filter. The file that you should give in should be a tab separated that starts with 'snp_id feature' and on the subsequent lines all the snp id (column one) and relevant feature ids (column two) to be tested together should speechified.

Output folder location