INDEP - winkusch/Easy2 GitHub Wiki

FUNCTION	PARAMETER	DEFAULT	DESCRIPTION
INDEP	--rcdCriterion		Criterion that defines SNPs to be used for independentisation. Required
INDEP	--arcdCriterion		array of R expression criteria (if clumping should be done on multiple columns/traits/ancestries simultaneously)
INDEP	--acolPval		array of P Value columns (usually only one value; yet multiple columns can be given. In this case clumping is done for each column and then combined/compared. This is useful for multi-trait/group/ancestry clumping.)
INDEP	--astrPvalTag		tag for indep p value (tag for P values in clump groups)
INDEP	--astrTag
INDEP	--anumPvalLim	1	array of Pvalue limits (array should be used if varying thresholds want to be used between clump groups)
INDEP	--colInChr		Column name of the input chromosome.
INDEP	--colInPos		Column name of the input position column.
INDEP	--numPosLim	500000	distance threshold for region-based clumping (minimum distance between clumps of genome-wide significant variants
INDEP	--numPosRegionExtension	-1,	base positions by which genome-wide significant clump coordinates will be extended to define a region (by default this is --numPosLim/2, which ensures non-overlapping "extended" regions)
INDEP	--acolIndep		array of columns that will be used for independization (minimized or maximized per region; alterntive for --acolPval)
INDEP	--astrIndepTag		tag for indep (alternative for --astrPvalTag)
INDEP	--anumIndepLim	1	array of numeric values for independentization limit (alternative for --anumPvalLim)
INDEP	--strIndepDir	min	clumping direction (minimize, 'min' per clump or maximize, 'max' per clump; useful when logarithmized P value column is given at --acolIndep)
INDEP	--fileClumpBed		bed file(s); if defined, LD-based clumping within regions is performed (if placeholder is used in --fileClumpBed, the function loops over chromosomes)
INDEP	--fileClumpSample		if bed files are defined, optional sample file to subset bed files can be given
INDEP	--numR2Thrs	0.2	LD clumping r2 threshold (clumps within region are combined by this threshold)
INDEP	--blnParal	FALSE	logical whether clumping process should be parallelized by chromosome (requires placeholder in --fileClumpBed)
INDEP	--pathLibLoc
INDEP	--blnClumpInSignal	TRUE
INDEP	--blnAddIndepInfo	FALSE	logical whether indep columns should be added to larger data set (helpful if filtering in later functions should be done on INDEP results)
INDEP	--colInMarker		Column name of the input marker column.
INDEP	--strTag	character	Tag for the function step that will be added to related variables in the REPORT and to related output to ensure unique and easily recognizable file names and REPORT variable names.

Example code:

Distance (+/-500kb) based clumping on variants with Pvalue<5e-8:

INDEP --rcdCriterion Pvalue<5e-8
--acolPval Pvalue
--colInChr chr
--colInPos pos
--numPosLim 500000
--colInMarker MarkerName
--strTag INDEP.d500kb
## results will be indicated by *region*

Distance (+/-500kb) and LD based (r2<0.1) clumping on variants with Pvalue<5e-8:

INDEP --rcdCriterion Pvalue<5e-8
--acolPval Pvalue
--colInChr chr
--colInPos pos
--numPosLim 500000
--fileClumpBed /path/to/bedfiles/1000g_topmed_imputed_chr<CHR>.hqx.cpaid.maf001
--fileClumpSample /path/to/bedfiles/1000g_topmed_imputed_chr<CHR>.hqx.cpaid.maf001
--numR2Thrs 0.1
--blnParal 1
--colInMarker MarkerName
--strTag INDEP.d500kb.r201
## results will be indicated by *region* (distance based) and *locus* (LD based)

Distance (+/-500kb) based clumping on variants with Pvalue<5e-8 and using max log-Pvalues to define lead variants:

INDEP --rcdCriterion Pvalue<5e-8
--acolIndep logPvalue
--strIndepDir max --colInChr chr
--colInPos pos
--numPosLim 500000
--colInMarker MarkerName
--strTag INDEP.d500kb.log
## results will be indicated by *region* and region lead variants are defined by max(logPvalue) per region

Distance (+/-500kb) based clumping on multiple Pvalue columns:

INDEP --rcdCriterion Pmen<5e-8|Pwomen<5e-8
--acolPval Pmen;Pwomen
--astrPvalTag MEN;WOMEN
--colInChr chr
--colInPos pos
--numPosLim 500000
--colInMarker MarkerName
--strTag INDEP.d500kb.men_women
## results will be indicated by *region* and indicator columns will be added that show whether each region contains MEN and/or WOMEN significnat variants