Daniel Fernandez Capstone S2 2020 - LuckIsMySkill/jidt GitHub Wiki

The script Calc v3a.py provides a means of testing JIDT's implementation of the KSG estimator for Mutual Information of mixed discrete-continuous datasets, by reproducing the plots in Ross (2014). It does this by creating synthetic data using the same distributions that he did, with a couple of additional options. In all cases, X takes 3 values, considered to be 0, 1 and 2. Y then takes continuous values from a marginal distribution, and these are related to each other by small changes of parameters (such as mean or standard deviation.) We seek to measure the mutual information I(X,Y), i.e. how much information will we gain about Y if we find out the value of X (without specifying what that value is right now)?

The calculator is initialised with imports, the most significant of which is JPype (used to interface Python with Java.) It also relies on the presence of DataGenerators.py, IntegralSquare.py and IntegralGaussian.py in the same directory. Those files are preset with Ross' parameter values, however you may change them at will to explore different types of Gaussian or square wave. The original values Ross You will need to change the directories in lines 11-12 to match your own setup, by replacing everything up to Capstone JL with the path to your root JIDT directory.

There are also a number of defaults in the main file. Line 14:

type_output=2

This value may be set to 0 (output to csv file in working directory) or 1 (plots with error bars instead of error area)

The next parameter to change is the size and reps of the experiment (lines 61-62.) To replicate Ross exactly, use size=10000 or size=400 in line 62.

You may also edit the maximum value of k to be used in the k-nearest-neighbours part of the KSG algorithm. It defaults to 10 in line 48. The same variable is used at the plotting stage, so no need to worry about the plot cropping out any of the data.

The file may then be run and prompts will appear to ask what type of data is required. By entering '1' at the first prompt (without quotes), you should obtain a plot that looks like this:

Square Wave, N=10000

If you have set up the Integral_square.py file correctly, a dotted blue line should also appear on your plot to indicate the true value of the MI, calculated from the distributions by numerical integration.

By entering '0' at the first prompt and '0' at the second, you will obtain an analogous plot for Ross' Gaussian distributions.

By entering '0' at the first prompt and '1' at the second, you will obtain a different plot for Ross' Gaussian distributions based on the Conditional Mutual Information calculator. In this case, we observe that the empirically observed mutual information varies with k, tending asymptotically towards its true value from below and including some negative values.

CMI, Z~N(0,1)

It would be very interesting to explore why the CMI is not stable while the MI is!