3. Choosing parameters - adam-zlatniczki/dimensional_causality GitHub Wiki

Preprocessing

As for any statistical method, data should be fed in a proper form. One of our crucial assumptions is that the data comes from a stable dynamical system: the time series must be stationary. You can check that for example with the augmented Dickey-Fuller unit root test. If your data is not stationary, then you have to first transform it.

If your data doesn't come from a dynamical system, then you should turn to other methods. If your data does come from a dynamical system, but it's noisy, then you must remove the noise. There are several methods for removing noise, but this also requires field expertise.

Finally, the scale of the two time series must match, otherwise one will dominate the other in the L2 distance. There are several ways to handle this. We found that min-max normalization (or rescaling), standardization, and quantile (or rank) normalization are proper choices.

Time-delay embedding (emb_dim and tau)

The preprocessed time series are embedded into an emb_dim dimensional space with lag tau, according to Takens' theorem. Both emb_dim and tau are parameters of our model that have to be specified. Takens showed that emb_dim = 2d + 1 (where d is the true dimension) is a sufficient choice for the embedding dimension in order to reconstruct dynamics, but unfortunately the true dimension is usually unknown - one may have to consult with a field expert who has sufficient knowledge about the dynamical system at hand and can give a proper estimation of the true dimension. If this is not possible, one can use several methods proposed in the literature for the selection of emb_dim. The correct value of emb_dim can be determined by searching for a plateau (saturation effect) in invariant quantities (such as intrinsic dimensionality) or by using the false nearest neighbour method.

Finding emb_dim. We applied an iterative process, starting with a high embedding dimension and decreasing it, checking the estimated manifold dimensions after each decrease and selecting the lowest possible emb_dim which did not reduce estimated d value sharply. Takens says that emb_dim = 2d +1 is a sufficient choice, but that does not mean that smaller emb_dim < 2d+1 can not be acceptable, this depends on the system. In general the self-intersections do not alter the estimated manifold dimension if emb_dim > d. In finite samples it is common that dimension estimates increase as the embedding dimension increases, therefore in general it is better to select emb_dim as low as possible.

Finding tau. The optimal value of tau can be determined from the first zero point of the autocorrelation function or from the first minima of automutual-information function. Additionally, one can optimize for emb_dim and tau at once by applying differential geometric arguments or using the statistical properties of dimension estimates calculated on embedded data, or determine an optimal embedding window (emb_dim-1)*tau. In order to find optimal tau we analysed the partial autocorrelation function (PACF) of the time series and selected the largest significant lag (which is how one would fit an autoregressive model). The first insignificant lag in the autocorrelation function (ACF) could be used as well, but we found PACF more effective in practice. The largest significant lag of PACF as embedding delay results in relatively independent (moderately redundant) coordinates, but still not too independent (irrelevance) to reconstruct dynamics. The ACF may diminish very slowly, resulting in very large tau, or for example if it is monotonic and we select the first insignificant lag we exclude the linear and their induced non-linear relationships. On the other hand, the largest significant lag in PACF tells us which is the largest lag where there is no direct linear relationship, but non-linear relationships induced by these linear ones remain (think of the classical partial correlation example when there is direct linear relationship between t1 and t2, t2 and t3, and therefore a quadratic between t1 and t3).

Manifold postprocessing (downsample_rate)

For example continuous dynamics (like a Lorenz system) evolve rather slowly and in a thread-like manner - if one does not choose a sufficiently large neighbourhood when estimating the local dimension at a point the manifold may seem one-dimensional, because the nearest points will all lie on the same thread. This effect is similar to sampling a slow process with too high frequency. This may be handled by down-sampling the manifold, which means re-sampling it with a lower frequency in a way such that the attractor will still get reconstructed.

K-range

Measuring the intrinsic dimension relies on the size of balls we consider. We use the k-NN method to find balls in which dimension will be estimated - however, the choice of k effects the estimation. If k is too low, then the estimate may become "too local" and tend to 1 (see the Lorenz argument above), and if k is too large, then it becomes "too global", since deformations of the manifold/attractor may bias the estimate, which wouldn't happen if smaller balls were used that adapt to the shape of the manifold better. To this end, one should apply a Bayesian approach: try several values of k. In practice, we found that 5 < k < 100 is a good range, but this really depends on the dynamical system.

Robustness of inference (eps)

Local dimension estimates may vary a lot due to the difference in shape and density of certain sections of an attractor. To this end, outliers will most probably occur. The dimension of the attractor is the given by the expected value of local dimensions, therefore we apply a robust estimator of location, the trimmed mean: we calculate the mean after dropping the upper and lower eps * 100% of the local estimates.

Speed of calculating probabilities (c and bins)

Finding c. In order to calculate case probabilities we would have to calculate nested integrals with infinite range. To avoid this we decrease the range by noting that we have to integrate Gaussian distributions, therefore we set the range as [mean - c x std, mean + c x std]. By default we use c = 3, which means that only 0.3% is cut off at the tails, yet we get a finite, small range which is easier to integrate over.

Finding bins. This range is then divided into a number of bins, which are specified by bins. The more bins we set, the more accurate the probabilities become, at the expense of higher computation time (since calculating 4 nested integrals requires O(_bins_^4) steps). Since the range is usually very small, even lower numbers of bins work very well. We set it by default to 20. You can always set it higher and check how much your results change.