Statistical Significance Tests - pwollstadt/IDTxl GitHub Wiki

The stats module provides the implementation of several statistical significance tests.

This tutorial shows how to specify the optional parameters for the five statistical significance tests used in the network inference algorithms. For a complete list of the parameters, refer to the Documentation. An brief description of the statistical significance tests used in IDTxl is provided in the stats section of the Wiki's theoretical introduction.

Some general considerations:

  • All tests offer the possibility of specifying both the number of permutations that are used to generate the test distribution and the critical alpha level to determine statistical significance. The number of permutations determines the minimum p-value that can be theoretically obtained from comparing an estimated value against the test distribution: the p-value is calculated as the fraction of the test distribution larger than the test statistic. If no value in the distribution is larger than the statistic, the p-value is set to (1/number of permutations). Hence, the number of permutations determines the smallest attainable p-value. If the specified critical alpha level is lower than the minimum attainable p-value, i.e., alpha < 1/no. permutations , an error is raised because the test can not return a significant result.
  • To generate a test distribution, the toolbox will by default try to generate surrogate data by shuffling the replications in the data (such that the temporal ordering of the samples is preserved). However, if the factorial of the number of replications is lower than the requested number of permutations, the surrogate data will be generated by shuffling the samples over time. The default behaviour can be overridden by setting the parameter permute_in_time=True (in this case, additional parameters can be specified).
  • For some estimators (e.g., Gaussian), surrogate distributions can be derived analytically (instead of estimating from surrogate data); IDTxl will use analytical surrogates whenever possible to save execution time

Maximum test

Gates the inclusion of past variables when finding non-uniform embeddings.

settings = {'n_perm_max_stat': 200,
            'alpha_max_stat': 0.05,
            'permute_in_time': False}

Alternatively, enforce the surrogate data generation by shuffling over time:

settings = {'n_perm_max_stat': 200,
            'alpha_max_stat': 0.05,
            'permute_in_time': True}

Minimum test

Pruns candidates selected for the non-uniform embedding.

settings = {'n_perm_min_stat': 200,
            'alpha_min_stat': 0.05,
            'permute_in_time': False}

Omnibus test

Test of omnibus information transfer into a target process.

settings = {'n_perm_omnibus': 500,
            'alpha_omnibus': 0.05,
            'permute_in_time': False}

Sequential maximum test

Test of individual information contributions of each variable in the final conditioning set.

settings = {'n_perm_max_seq': 500,
            'alpha_max_seq': 0.05,
            'permute_in_time': False}

FDR correction

Corrects the FWER on the network level.

settings = {'fdr_correction': True,
            'alpha_fdr': 0.05,
            'correct_by_target': True}

Permutation test for mutual information

Tests mutual information estimates against surrogates (also used for AIS estimation).

settings = {'n_perm_mi': 500,
            'alpha_mi': 0.05}