2. Running TRADE (Univariate) - ajaynadig/TRADE GitHub Wiki

Overview

In univariate mode, TRADE estimates the distribution of differential expression effects for a single contrast.

Sample command:

TRADE_output <- TRADE(mode = "univariate",
                     results1 = results,
                     annot_table = NULL,
                     genes_exclude = NULL,
                     n_sample = NULL)

Additional inputs

annot_table

TRADE can estimate enrichments of differential expression effects in gene sets. To perform such an analysis, a number_genes x number_annotations matrix should be provided in the annot_table argument, with binary entries where 1 indicates membership in a gene set, and rownames contain gene names. The gene naming convention for the annotation must be the same as the input differential expression summary statistics.

genes_exclude

This input can be used to exclude genes present in the summary statistics from TRADE analysis. For example, if your dataset is from perturbation of a particular gene, and you are interested only in effects of perturbation on other genes, you may exclude that gene.

n_sample

For various reasons, you may want to draw samples from the inferred effect size distribution, i.e. to compute some distribution feature of interest without a simple analytical solution. You can specify the number of samples to draw with the n_sample argument.

TRADE output

distribution_summary

This is the key TRADE output for most analyses, and contains the following three estimates:

  • transcriptome_wide_impact: The transcriptome-wide impact, i.e. the variance of the inferred effect size distribution. This quantity, in units of log2FoldChange^2, is a measure of the overall effect of the contrast of interest on the transcriptome. For more details please see the manuscript.
  • Me: The effective number of DEGs. This is a function of the kurtosis of the effect size distribution, and captures the number of DEGs in a manner that does not rely on an arbitrary threshold between null and very small effects. For more details please see the manuscript. Note that Me estimates are generally uninterpretable in the setting of non-significant transcriptome-wide impact. This is the most likely explanation if you estimate Me to be very close to the total number of genes.
  • mean: The mean of the effect size distribution. Provided for completeness, it is not clear how to interpret this number, especially in the setting of normalization procedures that remove expression changes that are consistent across the whole transcriptome.

significant_genes_Bonferroni and significant_genes_FDR

This output contains estimates related to significant genes. For both Bonferroni and FDR correction procedures, this output contains:

  • significant_gene_results: A table simply containing the differential expression summary statistics for the significant genes.
  • var_sig: The effect size variance for significant genes. This is like a transcriptome-wide impact estimate for just significant genes.
  • var_nonsig: The effect size variance for nonsignificant genes. This is like a transcriptome-wide impact estimate just for non-significant genes.
  • frac_sig: The proportion of differential expression signal in significant genes. When this number is low, the majority of differential expression signal is outside of significant genes.
  • num_sig: The number of significant genes.
  • num_nonsig: The number of non-significant genes.

annot_output

This output contains a table with estimates related to gene annotations. The columns of this table are:

  • annot: The name of the annotation.
  • var: The variance of the effect size distribution for genes in this annotation. This is like a gene-set-specific transcriptome-wide impact.
  • frac_var: The fraction of differential expression signal in this gene set.
  • frac_genes: The fraction of genes in this gene set
  • enrichment: The enrichment of differential expression signal in this gene set, i.e. frac_var/frac_genes

Note that enrichments are only well-defined in the setting of non-zero transcriptome-wide impact. If you see very strange results, this is the most likely explanation

fit

This output contains the actual inferred effect size distribution. This is taken directly from the ash output; please consult the ashR documentation for detailed explanation. Briefly, the components of the inferred distribution are:

  • pi: The inferred mixture weights.
  • a: The left boundaries of the half-uniform mixture components
  • b: the right boundaries of the half-uniform mixture components

Additionally, loglik contains the log-likelihood for the model fit, which can be helpful for model comparison.

qc

This output contains quality control metrics:

  • num_na: The number of NA effects in the supplied differential expression summary statistics.
  • num_extreme: The number of effects with log2FoldChange magnitude greater than 10, which are removed from the analysis. This is a rough heuristic meant to catch cases where DESeq2 did not converge, and perhaps should be a user specified parameter.
  • num_exclude: The total number of genes excluded by the above two filters.