Objective function, likelhood and chi square in the d2d framework - Data2Dynamics/d2d GitHub Wiki

Content of this page

At this page, the following aspects and their relationships are explained:

  1. How is the objective function used for parameter estimation calculated?
  2. What's done in arCalcMerit?
  3. What's plotted on the vertical axis of the profile likelihood plots?
  4. How are priors considered?
  5. What's the impact of the number of parameters?
  6. What's the impact of fitting error parameters?
  7. Why is ar.chi2 unequal to sum(ar.res.^2)?
  8. What's the Bessel correction?
  9. Is it possible to assess whether the model is in agreement with the data?

Helpful commands

  1. Evaluation of the objective function used in optimization:
sum([ar.res,ar.constr].^2)
  1. Goodness-of-Fit, i.e. -2*log(L):
2*ar.ndata*log(sqrt(2*pi)) + ar.chi2fit

or output of

arCalcMerit

or if ple has been performed

feval(pleGlobals.merit_fkt)
  1. Standard chi-square:
ar.config.fiterrors = 0;
arCalcMerit
ar.chi2

1. Objective function used for parameter estimation

The objective function, also called merit function, is defined in arFit.m. The optimizer calls the merit-function which, in turn, calls arCalcMerit to update the model and the residuals.

By default, lsqnonlin is used for optimization. Lsqnonlin works with residuals res which are internally squared and summed up, i.e. lsqnonlin minimizes sum(res.^2) where

res = [ar.res ar.constr] .

In d2d, the residual vector of the whole model (ar.res) consists of

  1. the union of all residuals of the individual data sets ar.model.data.res
  2. and the residuals originating from penalties of for the individual parameters.
  3. When error model parameters are fitted, the union of ar.model.data.reserr is also added.

If steady-state constraints are used, ar.constr is added to ar.res for optimization. The comprehensive objective function can be calculated by hand via

sum([ar.res,ar.constr].^2).

2. arCalcMerit

arCalcMerit is the major function for assessing the mismatch between model and data.

arCalcMerit calls arSimu and therefore integrates the ODE and updates the model predictions based on the current parameters in ar.p.

The following variables are calculated for assessing the goodness-of-fit:

  • ar.ndata, i.e. the number of data points used for fitting
  • ar.config.fiterrors_correction which is the Bessel correction given by
ar.config.fiterrors_correction = ar.ndata/(ar.ndata-sum(ar.qError~=1 & ar.qFit==1));
  • ar.chi2 contains the contribution of data and prior. It is the sum of ar.model(m).data(d).chi2 for all models m and all data sets d which are used for fitting (ar.model.data.qFit) plus potential terms originating from parameter priors.
  • ar.chi2err is the sum of all ar.model(m).data(d).chi2err from all parameter dependent measurement errors, if used for fitting according to ar.model.data.qFit.
  • ar.chi2prior is the sum of penalties originating from priors for parameters. For Gaussian priors, the penalties have to following form:
(ar.mean(jp)-ar.p(jp))./ar.std(jp).
  • ar.chi2constr is a penalty which is used when steady state constraints are introduced for dx/dt at the first time point. Qualitatively, such constraints are penalized by
(dxdt ./ x) ./ ar.model(m).condition(c).stdSteadyState.
  • ar.chi2fit coincides with ar.chi2 if the error parameters are not fitted. Otherwise it is
ar.chi2fit = ar.chi2+ar.chi2err.

The output written by arCalcMerit.m at the command line is

-2 log(L) = 2*ar.ndata*log(sqrt(2*pi)) + ar.chi2fit

and looks the following:

>> arCalcMerit
-2*log(L) = -130.824, 46 data points, 16 free parameters, first order optimality criterion 0.340897 (0) 

Negative values for -2*log(L) only occur if error parameters are fitted, i.e. if ar.config.fiterrors==1. The standard chi-square is obtained by

>> ar.config.fiterrors=0;

>> arCalcMerit

global chi^2 = 33.0002, 46 data points, 16 free parameters, first order optimality criterion 0.244546 (0)

Note that the estimated parameters in general depend on the fitted errors. Therefore, when ar.config.fiterrors is altered, the minimum changes.

3. Difference between residuals and chi2: res vs. chi2

In the case of fitted error parameters, there is a mismatch between the objective function used for optimization which is based on ar.res and ar.model.data.reserr and all the chi2* fields in the global ar struct which are based on ar.model.data.chi2err.

The magnitude of ar.model.data.reserr should be proportional to the square root of the contribution of the standard deviation parameters in the log-likelihood. To omit negative arguments in the square root, i.e. negative values for 2*log(ystd), an additive constant add_c = 50 is added in arSimuCalc.c.

   double add_c = 50.0;
(...)
   reserr[it + (iy*nt)] = 2.0*log(ystd[it + (iy*nt)]);
(...)
   reserr[it + (iy*nt)] += add_c;
   if(reserr[it + (iy*nt)] < 0) {
       printf("ERROR error model < 1e-10 not allowed\n");
       return;
   }
   reserr[it + (iy*nt)] = sqrt(reserr[it + (iy*nt)]);
   chi2err[iy] += pow(reserr[it + (iy*nt)], 2) - add_c;

The constant add_c does not change the optimum, but the order of magnitude of the residuals and merit-function. The additive constant does not enter ar.model.data.chi2err. This is the reason why the output fval of the optimizer as well as ar.chi2fit and all other fields ar.chi2 are not comparable to ar.res when error parameters are fitted.

4. Vertical axis of likelihood profiles

The vertical axis of the likelihood profiles is given by pleGlobals.chi2s and contains the contribution of data and priors to the objective function.

It is calculated in the following manner:

pleGlobals.chi2s is obtained by evaluating all parameter vectors obtained by the profile likelihood calculation and coincides with pleGlobals.chi2 for a single parameter vector.

pleGlobals.chi2 is calculated by evaluating pleGlobal.merit_fkt which in turn is defined in arPLEInit.m as arPLEMerit. In arPLEMerit, the following formulas are used:

if(ar.config.fiterrors == 1)
    chi2 = 2*ar.ndata*log(sqrt(2*pi)) + ar.chi2fit;
else
    chi2 = ar.chi2;
end

5. Impact of priors

arPrint indicates the type of prior in the last column. The following priors or respective penalties are available:

  • ar.type(jp)==0: Parameters without priors. Such parameters only have lower and upper founds defined in ar.lb and ar.ub.
  • ar.type(jp)==1: Parameters with L2 penalty corresponding to a normally distributed prior.
  • ar.type(jp)==3: Parameters with L1 penalty.

If there is no prior impact in the whole model, ar.chi2prior==0.

If L1 or L2 priors are used, then the penalty terms are calculated in arCalcMerit and enter ar.chi2prior, ar.chi2 and ar.chi2fit. In addition ar.res is augmented in arCalcMeritby additional entries for the penalties.

If priors are used, the contribution of the priors is printed at the command line

>> arCalcMerit
global chi^2 = 21.9374, 47 data points, 16 free parameters, 1.254 violation of 1 prior assumptions, ...
>> arGetMerit
global chi^2 = 21.9374, 47 data points, 16 free parameters, 1.254 violation of 1 prior assumptions, ...

6. Impact of the number of parameters

In this context, the number of parameters corresponds to the number of fitted parameters and therefore depends on ar.qFit.

If the Bessel correction is active, the outcome of arChi2 depends on the number of parameters because the residuals of the data depend on the fiterrors_correction (arSimuCalc.c):

res[it + (iy*nt)] = (yexp[it + (iy*nt)] - y[it + (iy*nt)]) / ystd[it + (iy*nt)] * sqrt(fiterrors_correction);

Since the fiterrors_correction enters each data residual in the same way, the fit does not depend on the number of fitted parameters. This situation can slightly change, if errors are fitted and/or priors and/or steady state constraints are used.

7. Impact of error parameters

  1. ar.config.fiterrors = 0 In this case,
  • the Bessel correction is switched off in arChi2 by setting ar.config.fiterrors_correction = 1 and
  • the error residuals ar.model.data.reserr are ignored for fitting and for calculating ar.res.
  1. ar.config.fiterrors = 1
  • The error residuals contribute to fitting.
  • If ar.config.useFitErrorCorrection==1, the Bessel correction is performed.

8. Bessel correction

Maximum likelihood estimation yields biased estimates for error parameters in the finite sample case. The so-called Bessel correction has been suggested to modify the likelihood for reducing the bias of estimated variances.

The well-known formula

SD = 1/(n-1) * sum (yexp-mean).^2

is the Bessel correction of the Maximum Likelihood estimator

SD_MLE = 1/n * sum (yexp-mean).^2

This example illustrates the general principle: The number of data points is replaced by the degrees of freedom, i.e. the number of data points minus the number of location parameters.

In d2d

ar.config.fiterrors_correction = ar.ndata/(ar.ndata-sum(ar.qError~=1 & ar.qFit==1));

is used for the Bessel correction and it enters into the residuals in arSimuCalc.c by

res[it + (iy*nt)] = (yexp[it + (iy*nt)] - y[it + (iy*nt)]) / ystd[it + (iy*nt)] * sqrt(fiterrors_correction);

9. Interpretation and assessing the model

If measurement errors are known, then the chi-square goodness of fit statistic should be in the same order of magnitude as the degrees of freedom, i.e. ar.chi2fit should be similar to ar.ndata - sum(ar.qFit==1). The corresponding statistical test is the frequently applied goodness-of-fit test which is implemented in d2d in as arChi2Test.

Please note, that the goodness-of-fit test strongly depends on correct measurement errors. It primarily assesses whether the errors fit and is therefore not recommended to assess whether the dynamics of the model fit the data.

Estimating error model parameters as a first step and then applying the goodness-of-fit test is not valid.

A more reliable procedure for comparing several models is given by the likelihood ratio test. This method is also applicable if error parameters are fitted. Since applicability and interpretation of likelihood ratio tests is a complex topic, we refer to the scientific literature at this point.

arPlotResiduals can be used to assess the distribution of the residuals by a QQ-plot or by their auto-correlation.