Generation of Surrogate Data - pwollstadt/IDTxl GitHub Wiki

The data provides various strategies for generating surrogate data for permutation testing (see Statistical significance tests).

To create surrogate data, IDTxl's default behavior is to take one of the variables entering the (conditional) mutual information estimation and randomly permute the replications of one of the variables (section Permutation of replications below). If the number of replications is not sufficient to generate a feasible number of surrogates, the fall-back option is to permute the samples of that variable in time (section Permutation of samples in time below).

Permutation of replications

By default, IDTxl tries to generate surrogates by permuting replications of one of the variables used in the estimation. This is done by calling Data.get_realisations() while setting shuffle=True. When shuffling replications, blocks of data are permuted while the temporal order of samples stays intact within replications:

    Original data:
        +--------------+---------+---------+---------+---------+---------+-----+
        | repl. ind.   | 1 1 1 1 | 2 2 2 2 | 3 3 3 3 | 4 4 4 4 | 5 5 5 5 | ... |
        +--------------+---------+---------+---------+---------+---------+-----+
        | sample index | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | ... |
        +--------------+---------+---------+---------+---------+---------+-----+
    Shuffled data:
        +--------------+---------+---------+---------+---------+---------+-----+
        | repl. ind.   | 3 3 3 3 | 1 1 1 1 | 4 4 4 4 | 2 2 2 2 | 5 5 5 5 | ... |
        +--------------+---------+---------+---------+---------+---------+-----+
        | sample index | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | ... |
        +--------------+---------+---------+---------+---------+---------+-----+

Permutation of samples in time

If the number of replications is not sufficient to generate the desired number of surrogate data, surrogates are created by shuffling samples in time. Creating surrogates by shuffling samples can also be explicitly requested by the user through setting permute_in_time to True when calling any network inference algorithm. Surrogate generation happens in the function Data.permute_samples().

Various strategies for permuting samples are implemented and can be defined by setting 'perm_type' to any of the following options (all options are passed via the settings dict when calling network inference algorithms):

'random': shuffle samples at random
'circular': shifts time series by a random number of samples
- Set 'max_shift' to define the maximum number of samples for shifting (e.g., number of samples / 2)
'block': swaps blocks of samples,
- Set 'block_size' to define the no. samples per block (e.g., number of samples / 10)
- Set 'perm_range' to define the range in which blocks can be swapped (e.g., number of samples / block_size)
'local': swaps samples within a given range
- Set 'perm_range' to define the range in samples over which realisations can be permuted (e.g., number of samples / 10)

The resulting surrogate data may look like the following:

    Original data:
        +--------------+-----------------+-----------------+-----------------+-----+
        | repl. ind.   | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
        | sample index | 1 2 3 4 5 6 7 8 | 1 2 3 4 5 6 7 8 | 1 2 3 4 5 6 7 8 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
    Circular shift by a random number of samples, e.g. 4 samples:
        +--------------+-----------------+-----------------+-----------------+-----+
        | repl. ind.   | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
        | sample index | 5 6 7 8 1 2 3 4 | 5 6 7 8 1 2 3 4 | 5 6 7 8 1 2 3 4 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
    Permute blocks of 3 samples:
        +--------------+-----------------+-----------------+-----------------+-----+
        | repl. ind.   | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
        | sample index | 4 5 6 7 8 1 2 3 | 4 5 6 7 8 1 2 3 | 4 5 6 7 8 1 2 3 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
    Permute data locally within a range of 4 samples:
        +--------------+-----------------+-----------------+-----------------+-----+
        | repl. ind.   | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
        | sample index | 1 2 4 3 8 5 6 7 | 1 2 4 3 8 5 6 7 | 1 2 4 3 8 5 6 7 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
    Random permutation:
        +--------------+-----------------+-----------------+-----------------+-----+
        | repl. ind.   | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+
        | sample index | 4 2 5 7 1 3 2 6 | 4 2 5 7 1 3 2 6 | 4 2 5 7 1 3 2 6 | ... |
        +--------------+-----------------+-----------------+-----------------+-----+