Discrete Sampling of True Prediction Bands - Data2Dynamics/d2d GitHub Wiki

Prediction vs Validation

The Computation of integration based prediction bands implemented in d2d can be used to generate trajectory uncertainties on the basis of validation profiles. However, an integration approach is not implemented if the band is based on prediction profiles. The difference beween both concepts is briefly explained in the article Estimating Prediction Uncertainty.

The validation profile approach is not necessary suitable to estimate prediction uncertainty, since the additional data uncertainty may be the dominating contribution to the overall uncertainty, although it is mostly used as just an auxiliary tool to calculate the prediction uncertainty. This can already be seen in the main article: The prediction bands for times close to zero cover a far range of negative values, which of course are impossible for the given model structure.

Hence, if there are doubts that the prediction uncertainty is poorly determined by the validation profiles, there are two main options:

Lower the data uncertainty: By decreasing the standard deviation of the auxiliary data point, the validation profile for any given time point should become more similar to the prediction profile. However, setting the data uncertainty to a value which is much smaller than the prediction uncertainty can produce numerical issues in the profile calculation.
Sample time course discretely: By abstaining from the use of integration features, the prediction band can be sampled discretely by calculating prediction profiles for a set of user-specified time-points. A major advanatge of such an approach is that the estimated prediction uncertainties at each time point are the actual predicion uncertainties rather than approximated values.

In the following, the second approach will be elucidated more thoroughly and a general workflow for the use of the corresponding functions is established.

A possible workflow

Setting the uncertainty of the auxiliary data point

Due to the nature of how prediction profiles are computed, a completely unsupervised use of the PPL-functions will usually be unsatisfactory. Since prediction profiles are constructed from an auxiliary validation profile (as shown in Likelihood based observability analysis and confidence intervals for predictions of dynamic models), choosing the standard deviation of the auxiliary data point is the crucial covariable which determines how well the prediction profile will be sampled.

Rule-of-Thumb: Choose the data uncertainty to be close to the prediction uncertainty to generate prediction profile most efficiently.

If the data uncertainty is much larger or much smaller than the prediction uncertainty, the prediction profile sampling can either fail completely, produce poor results or require many steps to converge to the confidence threshold. If the prediction profile still looks sensible but did not reach the threshold, this can usually be fixed by decreasing the data uncertainty.

Proposing a workflow

As the importance of the uncertainty of the auxiliary data point has now been established, the following workflow making use of this information is proposed:

Identify the model trajectories of interest and a set of times where the prediction profiles should be calculated. Regions with more notable changes of the model trajectory should be sampled more densely. Caution: As an empirical observation, using t = 0 can often be problematic if initial conditions are fixed, since the prediction uncertainty tends to get close to zero. Thus, it is recomend to use another time point close by, which has a more measurable prediction uncertainty.
For each trajectory, estimate a suitable data uncertainty by looking at the trajectories. In this setting, estimation means that you should get a feeling for the scale of the data and choose a small xstd accordingly. For example, if your trajectory spans a range of 0-100 on a data scale, sigma = 1 would about be a reasonable first choice. Caution: If the prediction uncertainty can be reasonably expected to change over orders of magnitudes over the course of one trajectory, it is most likely beneficial to split the trajectory into several regions where different data uncertainties will be used.
Looping over all trajectories of interest, i.e. looping over different models, conditions and observables, calculate the prediction band for a vector of times which must in principle be uniquely specified for each trajectory:

loop_list = {{m1,d1,ix1,ts1,sigma1},{m2,d2,ix2,ts2,sigma2},...};
for ii = 1:length(loop_list)	
	PPL_options('Integrate',false,'doPPL',true,...
		'n_steps_profile',200,'xstd',loop_list{ii,5});
	arPPL(loop_list{ii,1},loop_list{ii,2},loop_list{ii,3},...
		loop_list{ii,4},1);
end

Check the resulting prediction bands (e.g. by using arPlot2). If the prediction band seems odd for some time points, check whether the prediction profile looks reasonable and converged to the confidence threshold by use of the function arPlotPPL. If a profile did not converge to its lower or upper bound threshold, the point is omitted when plotting the prediction band, which will in most cases be noticeable. If such problematic profiles are encountered, this is most likely due to poorly specifed xstd. These profiles can be re-calculated by setting ar.model(m).condition/data(d).ppl.ppl_ub_threshold_profile for the corresponding experimental conditions (i.e. time points, observable) to NaN. The loop above (with respecified xstd) can then be re-used, since only the experimental conditions where a NaN was set are calculated again.
At the end, additional time points can be added if the trajectories should be sampled more densely in some regions.

Employing the function `VPL`

The fifth argument of arPPL called takeY determines whether prediction profiles are calculated for a model state or for an observable. In the latter case, the basic algorithm for profile calculation in arPredictionProfile is replaced by the more refined and reliable independent algorithm VPL, which should make the calculation of prediction bands as a whole more reliable. Since issues of lacking stability have been observed for the more basic algorithm, it is recommended to introduce the states of interest as observables into the model to make use of the more refined algorithm.

A practical example

This example code could be used in the model ...\Examples\Raia_CancerResearch2011:

Setup;

sigmas = [0.01,0.2,0.1,0.02,10,0.02,2,1]; % determine sigmas from trajectories
ts = [1,3,10:10:130]; % avoid t = 0

for ii = 1:3
    % loop over first 3 observables for example
    for jj = 2:4
        % loop over conditions
        % condition 1 are mostly trajectories fixed to zero, 
	%        for which PPL can not be sampled	
    PPL_options('Integrate',false,'doPPL',true,...
        'n_steps_profile',200,'xstd',sigmas(ii));
    arPPL(1,jj,ii,ts,1);
    end
end

This produces the following prediction bands:

Raia_Example