Simulating quantitative traits - gc5k/GEAR GitHub Wiki
Simulation for quantitative traits
Options
--sample-size/--n
Specify the sample size. 100 by default.
--marker/--m
Specify the number of total markers. 100 by default.
--null-marker
Specify the number of markers from null distribution. 0 by default.
--freq
Specify the frequencies for the markers. 0.5 by default.
--unif-freq
It generates frequency spectrum from uniform distribution between 0.01~0.5.
--freq-file
Specify the file that has frequencies for the reference alleles. One element per line.
--poly-effect
It generates polygenic effects from the standard normal distribution.
--poly-effect-sort
It generates polygenic effects from the standard normal distribution. Different from --poly-effect, this option will sort the genetic effects in ascending order, so that the first marker has the smallest effect and the last the biggest.
--effect
Specify the universal effect for each loci. It defaults to 0.5.
--effect-file
Specify the file that has the effect for each locus, one element per line.
--ld
Specify LD in Lewontin's D', a value between -1 to 1. It defaults to 0, linkage equilibrium for markers.
--rand-ld
It generates Lewontin's D' from the uniform distribution between -1 to 1.
--ld-file
Specify LD in for two consecutive markers. Given m markers, this file has m-1 lines.
--hsq
Specify the heritability. It defaults to 0.5.
--rep
Specify the replication for simulation. It defaults to 1.
--make-bed
It generates genotypes in bed format.
--fam-prefix
Specify the prefix for family ids.
Examples
gear simuqt --n 1000 --m 1000 --null-marker 900 --freq 0.45 --ld 0.3 --hsq 0.25 --poly-effect --out test
gear simuqt --n 1000 --m 1000 --null-marker 500 --unif-freq --rand-ld --hsq 0.2 --effect-file eff.txt --out test
gear simuqt --n 1000 --m 1000 --freq-file frq.txt --poly-effect --ld 0.8 --out test
gear simuqt --n 1000 --m 1000 --freq-file frq.txt --poly-effect --ld-file ld.txt --out test
The output files includes *.bim, *.fam, and *.bed (the genotype file in plink binary format).
*.phe: there are three columns included. The first two columns are family id, and individual id. The 3rd column is phenotypic value. When replication is bigger than 1, from the 3rd column represents phenotypic values for each replication.
*.breed: genotypic values for the simulated population.
*.rnd: there are three columns included. 1st is the marker name, 2nd is the reference allele, the 3rd column is its additive effect.
*.add: the genotype in additive model coding scheme.